GDB 6.4 and translations

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* GDB 6.4 and translations
@ 2004-11-03 22:02 Andrew Cagney
  2004-11-04  4:39 ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cagney @ 2004-11-03 22:02 UTC (permalink / raw)
  To: gdb

Just so everyone is on the same page with this challenge.

GDB 6.3 is going to include a .pot file:
  $ bunzip2 < gdb-6.2.90.tar.bz2 | tar tf - | grep gdb.pot
  gdb-6.2.90/gdb/po/gdb.pot
(it's generated as part of the release process) but won't include any 
corresponding translation.

Going on to 6.4, we'll need to quickly and efficiently:

- mark up a sizeable portion of the text
- ship the .pot file off for translation
(actually I've been asked to point at a download area - hence .50 and 
snapshots/ directory changes) (I've not yet pushed a gdb.pot file 
because it is effectively empty)
- wait for the translations
- integrate requested text changes and prototype text

and then iterate a bit.  There are two time consuming tasks here:

- mark up text
- translate

For the second of those - translate - the translation project 
[understandably] asks that they be allowed a significant amount of time 
(months not weeks) for their work.  Because of this, we'll need to 
complete the bulk of the mark-up (create .pot files) very early in the 
6.4 release cycle as otherwize we'll find ourselves either squeezing the 
translation group's schedule, or having to let the translations slip out 
to 6.4.1,.2,....

enjoy,
Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-03 22:02 GDB 6.4 and translations Andrew Cagney
@ 2004-11-04  4:39 ` Eli Zaretskii
  0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2004-11-04  4:39 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb

> Date: Wed, 03 Nov 2004 17:02:08 -0500
> From: Andrew Cagney <cagney@gnu.org>
> 
> For the second of those - translate - the translation project 
> [understandably] asks that they be allowed a significant amount of time 
> (months not weeks) for their work.  Because of this, we'll need to 
> complete the bulk of the mark-up (create .pot files) very early in the 
> 6.4 release cycle

FWIW, I don't see this as a significant problem: since the GDB code
changes seldom touch the parts that output text messages, we could
theoretically submit our .pot file as soon as we finish marking the
messages, even if that happens tomorrow.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04 15:22     ` Daniel Jacobowitz
  2004-11-04 16:21       ` Paul Schlie
@ 2004-11-04 21:35       ` Eli Zaretskii
  1 sibling, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2004-11-04 21:35 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: schlie, gdb

> Date: Thu, 4 Nov 2004 10:22:43 -0500
> From: Daniel Jacobowitz <drow@false.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, gdb@sources.redhat.com
> 
> Eli, the background is that GCC has adopted a new mechanism (the 'q'
> qualifier to its internal diagnostics machinery, which takes
> printf-like formats).  This allows GCC to output Unicode quotes when
> using the untranslated (i.e. English) messages - if the current locale
> supports them.

Thanks for explaining this.

FWIW, I think GCC guys were too quick to assume that UTF-8 locales are
good enough for this to be the default, but I agree that the fix is
easy enough for users who bump into this problem.

> People do still parse the CLI.  They will no matter what we tell them.
> Well, it's never been intended as a machine parseable interface

That's true, but I think that some use of CLI is inevitable even if
the front end uses MI as its main protocol.  However, I think that a
front end should set LC_MESSAGES=C before invoking an inferior GDB (or
perhaps GDB should do this internally when invoked with -interp=mi?),
and we should be careful not to translate any string printed by MI.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04 15:34   ` Andrew Cagney
@ 2004-11-04 16:55     ` Paul Schlie
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Schlie @ 2004-11-04 16:55 UTC (permalink / raw)
  To: Andrew Cagney, Eli Zaretskii; +Cc: gdb

> From: Andrew Cagney <cagney@gnu.org>
> 
> Eli Zaretskii wrote:
>>> Date: Wed, 03 Nov 2004 19:27:19 -0500
>>> From: Paul Schlie <schlie@comcast.net>
>>> 
>>> Although I don't know if it's been considered or even an issue, but it may
>>> be worth trying to avoid the use of Unicode's typographical quote characters
>>> in otherwise ASCII message string output on even Unicode supported platforms
>>> by default
>> 
>> 
>> Sorry, I have no idea what you are talking about; please consider
>> elaborating, e.g., by providing an example of such a problematic
>> message.
> 
> Yep, huh?  Can someone please post a concrete example of what this is
> all about?
> 
> Andrew

Basically GCC 4.0 has tentatively adopted a convention to allow the
specification of a quoted format specifier something basically like:

 printf("quoted %qX" some_value) => quoted "123", for example.

Which I is arguably cleaner then attempting to escape embedded quotes
within format strings; but has then further chose to hard code the
generation of Unicode left/right typographical quote characters in lieu
of vanilla ASCII quote characters by default if the local environment
variable indicates that Unicode is supported, which may be taking
things too far, see: http://gcc.gnu.org/ml/gcc/2004-10/msg01271.html

(but the good news is that it may be relatively easily overridden by
modifying local environment variables seen by GCC prior to being invoked)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04 15:22     ` Daniel Jacobowitz
@ 2004-11-04 16:21       ` Paul Schlie
  2004-11-04 21:35       ` Eli Zaretskii
  1 sibling, 0 replies; 11+ messages in thread
From: Paul Schlie @ 2004-11-04 16:21 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Eli Zaretskii, gdb

> From: Daniel Jacobowitz <drow@false.org>
> 
> On Thu, Nov 04, 2004 at 12:52:43AM -0500, Paul Schlie wrote:
>> No, I don't think you're missing anything. I was simply speculating, being
>> ignorant of GDB's longer term internationalization plans, that it may be
>> wise to try to avoid the potential complications associated with the default
>> use of Unicode left/right quote character codes as tentatively chosen to be
>> used in GCC 4.0 quoted output message text on unicode supported platforms;
>> as although it may seem aesthetically pleasant, it's likely to create
>> otherwise unnecessary complications in circumstances where interface,
>> status, warning, and/or error messages may be parsed by subsequent tools
>> which may not be unicode aware.
> 
> And I think you're just as wrong here as you were when you said this on
> the GCC list.

- I accept that I may simply be wrong, or minimally perceiving it to be
  more significant than it may be; but please too accept that you may be
  simply wrong, or minimally perceiving it to be less significant issue
  than it may be.

> Eli, the background is that GCC has adopted a new mechanism (the 'q'
> qualifier to its internal diagnostics machinery, which takes
> printf-like formats).  This allows GCC to output Unicode quotes when
> using the untranslated (i.e. English) messages - if the current locale
> supports them.  A user with UTF-8 locales and non-UTF-8 terminals
> complained, and Paul also objected on machine-parseability grounds.  So
> fix your locale and move on... I think the nicety of providing the
> quote characters the user's locale requested is a very nice touch.

- I too like the quoted format specifiers, but as above, simply don't see
  any value to hard-coding alternative typographical quote characters; as
  even many text display programs simply substitute left and right matching
  quotes in a context dependant way when the text is displayed, without the
  necessity to hard code them; which is typically how it's done, as most
  keyboards for example only have one quote character key, and seem to be
  able to specify typographically neutral text which can be displayed as
  desired without much difficulty.

>> Where given your statements, it doesn't seem to be part of GDB's present
>> plans, which I suspect is good; but still suspect that any translated
>> message text containing ASCII symbols which are anticipated to be
>> potentially utilized by other programs for whatever purpose, should likely
>> retain the original ASCII symbol codes in the text were possible by default
>> (even on Unicode platforms) to prevent potential subsequent complications,
>> if there's a choice in the matter.
> 
> People do still parse the CLI.  They will no matter what we tell them.
> Well, it's never been intended as a machine parseable interface (that's
> what MI is for nowadays), so if they have to add locale workarounds I'm
> entirely unsympathetic.

- I guess time will tell...

> -- 
> Daniel Jacobowitz

Thanks, -paul-


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04  4:52 ` Eli Zaretskii
  2004-11-04  5:52   ` Paul Schlie
@ 2004-11-04 15:34   ` Andrew Cagney
  2004-11-04 16:55     ` Paul Schlie
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Cagney @ 2004-11-04 15:34 UTC (permalink / raw)
  To: Eli Zaretskii, Paul Schlie; +Cc: gdb

Eli Zaretskii wrote:
>>Date: Wed, 03 Nov 2004 19:27:19 -0500
>>From: Paul Schlie <schlie@comcast.net>
>>
>>Although I don't know if it's been considered or even an issue, but it may
>>be worth trying to avoid the use of Unicode's typographical quote characters
>>in otherwise ASCII message string output on even Unicode supported platforms
>>by default
> 
> 
> Sorry, I have no idea what you are talking about; please consider
> elaborating, e.g., by providing an example of such a problematic
> message.

Yep, huh?  Can someone please post a concrete example of what this is 
all about?

Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04  5:52   ` Paul Schlie
  2004-11-04  8:11     ` Fabian Cenedese
@ 2004-11-04 15:22     ` Daniel Jacobowitz
  2004-11-04 16:21       ` Paul Schlie
  2004-11-04 21:35       ` Eli Zaretskii
  1 sibling, 2 replies; 11+ messages in thread
From: Daniel Jacobowitz @ 2004-11-04 15:22 UTC (permalink / raw)
  To: Paul Schlie; +Cc: Eli Zaretskii, gdb

On Thu, Nov 04, 2004 at 12:52:43AM -0500, Paul Schlie wrote:
> No, I don't think you're missing anything. I was simply speculating, being
> ignorant of GDB's longer term internationalization plans, that it may be
> wise to try to avoid the potential complications associated with the default
> use of Unicode left/right quote character codes as tentatively chosen to be
> used in GCC 4.0 quoted output message text on unicode supported platforms;
> as although it may seem aesthetically pleasant, it's likely to create
> otherwise unnecessary complications in circumstances where interface,
> status, warning, and/or error messages may be parsed by subsequent tools
> which may not be unicode aware.

And I think you're just as wrong here as you were when you said this on
the GCC list.

Eli, the background is that GCC has adopted a new mechanism (the 'q'
qualifier to its internal diagnostics machinery, which takes
printf-like formats).  This allows GCC to output Unicode quotes when
using the untranslated (i.e. English) messages - if the current locale
supports them.  A user with UTF-8 locales and non-UTF-8 terminals
complained, and Paul also objected on machine-parseability grounds.  So
fix your locale and move on... I think the nicety of providing the
quote characters the user's locale requested is a very nice touch.

> Where given your statements, it doesn't seem to be part of GDB's present
> plans, which I suspect is good; but still suspect that any translated
> message text containing ASCII symbols which are anticipated to be
> potentially utilized by other programs for whatever purpose, should likely
> retain the original ASCII symbol codes in the text were possible by default
> (even on Unicode platforms) to prevent potential subsequent complications,
> if there's a choice in the matter.

People do still parse the CLI.  They will no matter what we tell them.
Well, it's never been intended as a machine parseable interface (that's
what MI is for nowadays), so if they have to add locale workarounds I'm
entirely unsympathetic.

-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04  5:52   ` Paul Schlie
@ 2004-11-04  8:11     ` Fabian Cenedese
  2004-11-04 15:22     ` Daniel Jacobowitz
  1 sibling, 0 replies; 11+ messages in thread
From: Fabian Cenedese @ 2004-11-04  8:11 UTC (permalink / raw)
  To: gdb


>> Or did I miss something?
>
>No, I don't think you're missing anything. I was simply speculating, being
>ignorant of GDB's longer term internationalization plans, that it may be
>wise to try to avoid the potential complications associated with the default
>use of Unicode left/right quote character codes as tentatively chosen to be
>used in GCC 4.0 quoted output message text on unicode supported platforms;
>as although it may seem aesthetically pleasant, it's likely to create
>otherwise unnecessary complications in circumstances where interface,
>status, warning, and/or error messages may be parsed by subsequent tools
>which may not be unicode aware.
>
>Where given your statements, it doesn't seem to be part of GDB's present
>plans, which I suspect is good; but still suspect that any translated
>message text containing ASCII symbols which are anticipated to be
>potentially utilized by other programs for whatever purpose, should likely
>retain the original ASCII symbol codes in the text were possible by default
>(even on Unicode platforms) to prevent potential subsequent complications,
>if there's a choice in the matter.

Wow... though I think I understood what you were saying... you know...
it IS possible to make a paragraph out of several sentences :) Not every
one is a native Englisch speaker, so simpler sentences may be easier
understood by other people. (I think I'll need to read that again :)

bye  Fabi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04  4:52 ` Eli Zaretskii
@ 2004-11-04  5:52   ` Paul Schlie
  2004-11-04  8:11     ` Fabian Cenedese
  2004-11-04 15:22     ` Daniel Jacobowitz
  2004-11-04 15:34   ` Andrew Cagney
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Schlie @ 2004-11-04  5:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

> From: Eli Zaretskii <eliz@gnu.org>
> 
>> Date: Wed, 03 Nov 2004 19:27:19 -0500
>> From: Paul Schlie <schlie@comcast.net>
>> 
>> Although I don't know if it's been considered or even an issue, but it may
>> be worth trying to avoid the use of Unicode's typographical quote characters
>> in otherwise ASCII message string output on even Unicode supported platforms
>> by default
> 
> Sorry, I have no idea what you are talking about; please consider
> elaborating, e.g., by providing an example of such a problematic
> message.
> 
> AFAIK, we don't use any non-ASCII characters in the GDB message text.
> If you know about any use of such characters in GDB, please point out
> where in the code we have them, since I believe that must be some bug.
> 
> As for the translated messages, it's entirely up to the translators'
> teams to decide how they encode the text in their language.  If they
> decide to use UTF-8 or some other Unicode encoding (and use Unicode
> quoting characters), there's no way we could prevent them from doing
> so.  Nor do I think we should: the translators know better than we do
> what characters are supported by end-user platforms in their locale.
> 
>> especially for text which may likely be subsequently parsed by tools likely
>> benefiting, and/or depending on the use of plain old ASCII quote characters.
> 
> If you are talking about GDB GUI front ends, they should invoke GDB
> after setting the Posix locale anyway, since they want the messages in
> English to be able to parse them.  THus, if the original messages we
> have in the code are in plain ASCII, the front ends will not have any
> problems here.
> 
> Or did I miss something?

No, I don't think you're missing anything. I was simply speculating, being
ignorant of GDB's longer term internationalization plans, that it may be
wise to try to avoid the potential complications associated with the default
use of Unicode left/right quote character codes as tentatively chosen to be
used in GCC 4.0 quoted output message text on unicode supported platforms;
as although it may seem aesthetically pleasant, it's likely to create
otherwise unnecessary complications in circumstances where interface,
status, warning, and/or error messages may be parsed by subsequent tools
which may not be unicode aware.

Where given your statements, it doesn't seem to be part of GDB's present
plans, which I suspect is good; but still suspect that any translated
message text containing ASCII symbols which are anticipated to be
potentially utilized by other programs for whatever purpose, should likely
retain the original ASCII symbol codes in the text were possible by default
(even on Unicode platforms) to prevent potential subsequent complications,
if there's a choice in the matter.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
  2004-11-04  0:27 Paul Schlie
@ 2004-11-04  4:52 ` Eli Zaretskii
  2004-11-04  5:52   ` Paul Schlie
  2004-11-04 15:34   ` Andrew Cagney
  0 siblings, 2 replies; 11+ messages in thread
From: Eli Zaretskii @ 2004-11-04  4:52 UTC (permalink / raw)
  To: Paul Schlie; +Cc: gdb

> Date: Wed, 03 Nov 2004 19:27:19 -0500
> From: Paul Schlie <schlie@comcast.net>
> 
> Although I don't know if it's been considered or even an issue, but it may
> be worth trying to avoid the use of Unicode's typographical quote characters
> in otherwise ASCII message string output on even Unicode supported platforms
> by default

Sorry, I have no idea what you are talking about; please consider
elaborating, e.g., by providing an example of such a problematic
message.

AFAIK, we don't use any non-ASCII characters in the GDB message text.
If you know about any use of such characters in GDB, please point out
where in the code we have them, since I believe that must be some bug.

As for the translated messages, it's entirely up to the translators'
teams to decide how they encode the text in their language.  If they
decide to use UTF-8 or some other Unicode encoding (and use Unicode
quoting characters), there's no way we could prevent them from doing
so.  Nor do I think we should: the translators know better than we do
what characters are supported by end-user platforms in their locale.

> especially for text which may likely be subsequently parsed by tools likely
> benefiting, and/or depending on the use of plain old ASCII quote characters.

If you are talking about GDB GUI front ends, they should invoke GDB
after setting the Posix locale anyway, since they want the messages in
English to be able to parse them.  THus, if the original messages we
have in the code are in plain ASCII, the front ends will not have any
problems here.

Or did I miss something?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GDB 6.4 and translations
@ 2004-11-04  0:27 Paul Schlie
  2004-11-04  4:52 ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Schlie @ 2004-11-04  0:27 UTC (permalink / raw)
  To: gdb

Although I don't know if it's been considered or even an issue, but it may
be worth trying to avoid the use of Unicode's typographical quote characters
in otherwise ASCII message string output on even Unicode supported platforms
by default; as the potential complications and/or confusions resulting from
such a default choice may likely not be worth any perceived benefit;
especially for text which may likely be subsequently parsed by tools likely
benefiting, and/or depending on the use of plain old ASCII quote characters.

(where even translated message text which can't be represented in ASCII,
 likely benefit from the use of plain old ASCII quote delimiters help keep
 it's subsequent machine parsing simple, and indifferent to the message's
 encoding)

The above comment does not necessarily apply to documentation; although any
text which is meant to represent program code should also likely limit its
quote character use to plain old ' and " pairs by default, as expected by
most programming language parsers; just as most mark-up languages assume for
text designated as being intended as "code".

Maybe the FSF should consider the specification of a policy which recognizes
the difference between text intended primarily for human consumption, vs
message text which may just as likely be parsed by integrated development
tools, regardless of Unicode support.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-11-04 21:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-03 22:02 GDB 6.4 and translations Andrew Cagney
2004-11-04  4:39 ` Eli Zaretskii
2004-11-04  0:27 Paul Schlie
2004-11-04  4:52 ` Eli Zaretskii
2004-11-04  5:52   ` Paul Schlie
2004-11-04  8:11     ` Fabian Cenedese
2004-11-04 15:22     ` Daniel Jacobowitz
2004-11-04 16:21       ` Paul Schlie
2004-11-04 21:35       ` Eli Zaretskii
2004-11-04 15:34   ` Andrew Cagney
2004-11-04 16:55     ` Paul Schlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).