public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Ability to disable URL mangling in makeinfo 4.7?
       [not found] <Pine.BSF.4.61.0408181309370.15562@acrux.dbai.tuwien.ac.at>
@ 2004-09-05 21:12 ` Gerald Pfeifer
  2004-09-05 21:32   ` Karl Berry
  0 siblings, 1 reply; 11+ messages in thread
From: Gerald Pfeifer @ 2004-09-05 21:12 UTC (permalink / raw)
  To: bug-texinfo; +Cc: gcc

I haven't seens any response to this.  This is really causing quite
some troubles for us on gcc.gnu.org, so any help will be appreciated.

On Wed, 18 Aug 2004, Gerald Pfeifer wrote:
> makeinfo 4.7 changed the mangling of special characters like '*' and
> '+' in URLs.
>
> Instead of
>  http://gcc.gnu.org/onlinedocs/gcc/C---Misunderstandings.html
> we now have
>  http://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Misunderstandings.html.
> and instead of anchors for target triplets like
>  #alpha*-*-*"
> we now have
>  #alpha_002a_002d_002a_002d_002a.
>
> I'm sure you will see how this is hurting us for the GCC web pages.
>
>
> Is there any way to request the old way, that is, keep "*" unchanged
> and use "-" or something similiar for problematic characters?
>
> If there is no way yet, would you mind adding such an option to your
> next release?

Gerald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-05 21:12 ` Ability to disable URL mangling in makeinfo 4.7? Gerald Pfeifer
@ 2004-09-05 21:32   ` Karl Berry
  2004-09-05 21:47     ` Gerald Pfeifer
  0 siblings, 1 reply; 11+ messages in thread
From: Karl Berry @ 2004-09-05 21:32 UTC (permalink / raw)
  To: gerald; +Cc: bug-texinfo, gcc

Hi Gerald,

    I haven't seens any response to this.  This is really causing quite
    some troubles for us on gcc.gnu.org, so any help will be appreciated.

I saw two answers, from Patrice and Stepan:
http://lists.gnu.org/archive/html/bug-texinfo/2004-08/msg00046.html
http://lists.gnu.org/archive/html/bug-texinfo/2004-08/msg00047.html

In essence, sorry, you need to regenerate your documents.  There were
ambiguities in the way the mangling was done before.  I do not
anticipate any need to change it again.

karl

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-05 21:32   ` Karl Berry
@ 2004-09-05 21:47     ` Gerald Pfeifer
  2004-09-06  0:34       ` Karl Berry
  0 siblings, 1 reply; 11+ messages in thread
From: Gerald Pfeifer @ 2004-09-05 21:47 UTC (permalink / raw)
  To: Karl Berry; +Cc: bug-texinfo, gcc

On Sun, 5 Sep 2004, Karl Berry wrote:
> I saw two answers, from Patrice and Stepan:

Thanks.  I'm not on the bug-texinfo list, and thus missed those.

> In essence, sorry, you need to regenerate your documents.

I strongly disagree with the arguments in that thread:

  - By making that change, and regenerating our documents, lots of
    external links (from gcc.gnu.org itself and others) as well as
    bookmarks are broken.

  - The new mangling schema might be okay for machines, but it is
    absolutely ugly for humans.

    As an example, we have anchors of the form
      *-ibm-aix*
    which according to the new schema would become
      _002a_002dibm_002daix_002a (or similiar).

    I really want to retain
      http://gcc.gnu.org/install/specific.html#*-ibm-aix*
    which we used for many years instead of using the, far more uglier,
      http://gcc.gnu.org/install/specific.html#_002a_002dibm_002daix_002a

Currently, I am using a simple `sed -e 's/_002d/-/g' -e 's/_002a/*/g'`
for  the GCC web pages, and given your feedback I assume I'll also need
to apply this to all of our (other) texinfo-based documents. :-(

Gerald
-- 
Gerald Pfeifer (Jerry)   gerald@pfeifer.com   http://www.pfeifer.com/gerald/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-05 21:47     ` Gerald Pfeifer
@ 2004-09-06  0:34       ` Karl Berry
  2004-09-14 23:42         ` Gerald Pfeifer
  0 siblings, 1 reply; 11+ messages in thread
From: Karl Berry @ 2004-09-06  0:34 UTC (permalink / raw)
  To: gerald; +Cc: bug-texinfo, gcc

    Thanks.  I'm not on the bug-texinfo list, and thus missed those.

Sorry, that was a mistake.  I didn't realize you weren't on the
recipient list.

    I strongly disagree with the arguments in that thread:

I'm sorry to hear it.

  - The new mangling schema might be okay for machines, but it is
    absolutely ugly for humans.

I agree it is ugly, which is quite unfortunate, but it is unambiguous.
Don't you think it's more important to be correct than pretty?  Do you
have an alternative scheme to propose?  Given that we have only a-z0-9_-
to work with, I don't see a way to do better.

  - By making that change, and regenerating our documents, lots of
    external links (from gcc.gnu.org itself and others) as well as
    bookmarks are broken.

I realize that this is a problem.  I suppose we could add anchors with
the old scheme, so that existing links would not be broken.  Would that
satisfy you?

    Currently, I am using a simple `sed -e 's/_002d/-/g' -e 's/_002a/*/g'`
    for  the GCC web pages, and given your feedback I assume I'll also need
    to apply this to all of our (other) texinfo-based documents. :-(

This is not a good solution, as I'm sure you'll agree, because it
creates a needless incompatibility between your documents and everyone
else's, and any Texinfo document that tried to link to yours would fail,
not to mention it makes more work for you.  Can't we find an approach
that we can both agree to?

k

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-06  0:34       ` Karl Berry
@ 2004-09-14 23:42         ` Gerald Pfeifer
  2004-09-15  0:15           ` Karl Berry
  0 siblings, 1 reply; 11+ messages in thread
From: Gerald Pfeifer @ 2004-09-14 23:42 UTC (permalink / raw)
  To: Karl Berry; +Cc: bug-texinfo, gcc

On Sun, 5 Sep 2004, Karl Berry wrote:
> I agree it is ugly, which is quite unfortunate, but it is unambiguous.
> Don't you think it's more important to be correct than pretty?  Do you
> have an alternative scheme to propose?  Given that we have only a-z0-9_-
> to work with, I don't see a way to do better.

We have been using things like #alpha*-*-* for many years, without
getting a single complaint.  I guess that's why I'm not too happy
about the makeinfo mangling change, even though I can see where you
are coming from.

>> - By making that change, and regenerating our documents, lots of
>>   external links (from gcc.gnu.org itself and others) as well as
>>   bookmarks are broken.
>
> I realize that this is a problem.  I suppose we could add anchors with
> the old scheme, so that existing links would not be broken.  Would that
> satisfy you?

Yes, that'd be very useful!

>> Currently, I am using a simple `sed -e 's/_002d/-/g' -e 's/_002a/*/g'`
>> for the GCC web pages, and given your feedback I assume I'll also need
>>  to apply this to all of our (other) texinfo-based documents. :-(
> This is not a good solution, as I'm sure you'll agree, because it
> creates a needless incompatibility between your documents and everyone
> else's, and any Texinfo document that tried to link to yours would fail,
> not to mention it makes more work for you.  Can't we find an approach
> that we can both agree to?

Are we restricted to the set of a-z0-9_- also for anchors?  Given our
experience, all web clients seem to support at least '*' as well.

If '-' is part of the supported set of characters, why do you rewrite
that as well?  Could this be avoided?

Gerald (sorry for the delay in responding to this; I'll be fully offline
 	for three weeks starting Sunday)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-14 23:42         ` Gerald Pfeifer
@ 2004-09-15  0:15           ` Karl Berry
  2004-09-15  3:26             ` Andreas Schwab
  0 siblings, 1 reply; 11+ messages in thread
From: Karl Berry @ 2004-09-15  0:15 UTC (permalink / raw)
  To: gerald; +Cc: bug-texinfo, gcc

    Yes, that'd be very useful!

Ok, we can work on that.  (Alper, if you feel like it, go for it...)

    Are we restricted to the set of a-z0-9_- also for anchors?  Given our
    experience, all web clients seem to support at least '*' as well.

I can believe that other characters are supported in anchor names
(although I thought that XML had very restrictive rules).  In any case,
it seems simpler to map a node name to the same string, whether it is
used as an anchor name or a filename, instead of having different rules.

    If '-' is part of the supported set of characters, why do you rewrite
    that as well?  Could this be avoided?

The problem is that we use - to map spaces.  So it can't also be used
for itself.  This is probably the most annoying thing about the new
scheme, but otherwise "node-name" and "node name" would map to the same
target.  Doesn't seem good.  I think there may have even been a case or
two in practice.

Hope you enjoy your time away from the computers ...

karl

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-15  0:15           ` Karl Berry
@ 2004-09-15  3:26             ` Andreas Schwab
  2004-09-15  7:07               ` Stepan Kasal
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Schwab @ 2004-09-15  3:26 UTC (permalink / raw)
  To: Karl Berry; +Cc: gerald, bug-texinfo, gcc

karl@freefriends.org (Karl Berry) writes:

>     Are we restricted to the set of a-z0-9_- also for anchors?  Given our
>     experience, all web clients seem to support at least '*' as well.
>
> I can believe that other characters are supported in anchor names
> (although I thought that XML had very restrictive rules).  In any case,
> it seems simpler to map a node name to the same string, whether it is
> used as an anchor name or a filename, instead of having different rules.

I may be missing something, but isn't %xy the preferred way to quote
special characters in URLs?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-15  3:26             ` Andreas Schwab
@ 2004-09-15  7:07               ` Stepan Kasal
  2004-09-15  7:59                 ` Eli Zaretskii
  2004-09-17  1:39                 ` Gerald Pfeifer
  0 siblings, 2 replies; 11+ messages in thread
From: Stepan Kasal @ 2004-09-15  7:07 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Karl Berry, gcc, gerald, bug-texinfo

Hello Andreas,

On Wed, Sep 15, 2004 at 01:42:30AM +0200, Andreas Schwab wrote:
> karl@freefriends.org (Karl Berry) writes:
> 
> >     Are we restricted to the set of a-z0-9_- also for anchors?  Given our
> >     experience, all web clients seem to support at least '*' as well.
> >
> > I can believe that other characters are supported in anchor names
> > (although I thought that XML had very restrictive rules).  In any case,
> > it seems simpler to map a node name to the same string, whether it is
> > used as an anchor name or a filename, instead of having different rules.
> 
> I may be missing something, but isn't %xy the preferred way to quote
> special characters in URLs?

There are two problems:

1) Mangling of the node names for filenames.
The main problem here isn't how to cite the URL, but how to name the
physical file.
Various filenames have various limitations, and we have to use the
intersection of all.

This is also the most annoying problem, when you have

http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html

instead of eg.

http://www.gnu.org/software/gawk/manual/html_node/Statements-Lines.html

Note: If you typed
http://www.gnu.org/software/gawk/manual/html_node/Statements%2fLines.html
to you browser, you'd still ask for file Lines.html in subdir Statements.

2) Mangling of the node names for anchor names.

Yes, it seems you are right, anchor names can contain any characters.
(I quickly looked at various documents on www.w3c.org. I'm not an expert
do forgive me if I'm misleaded.)

When defining/using them, you have to use the proper escaping.
This escaping would be character entities (&#x2f;) in HTML 4.01, and
percent-escaping (%2f) in XML.  But this is a problem for programmers.

So, you could have:
http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements/Lines
instead of
http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements_002fLines

Would this help so much?

And, of course, the real browsers would surely differ from what the
standards require.

Stepan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-15  7:07               ` Stepan Kasal
@ 2004-09-15  7:59                 ` Eli Zaretskii
  2004-09-17  1:39                 ` Gerald Pfeifer
  1 sibling, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2004-09-15  7:59 UTC (permalink / raw)
  To: Stepan Kasal; +Cc: schwab, gcc, gerald, bug-texinfo, karl

> Date: Wed, 15 Sep 2004 08:06:35 +0200
> From: Stepan Kasal <kasal@ucw.cz>
> Cc: gcc@gcc.gnu.org, gerald@pfeifer.com, bug-texinfo@gnu.org,
> 	Karl Berry <karl@freefriends.org>
> 
> 2) Mangling of the node names for anchor names.
> 
> Yes, it seems you are right, anchor names can contain any characters.

Don't we create file names for anchors as well, at least under some
combination of command-line switches (I simply don't remember)?  If we
do, we cannot use any characters that we don't allow in mangled node
names.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-15  7:07               ` Stepan Kasal
  2004-09-15  7:59                 ` Eli Zaretskii
@ 2004-09-17  1:39                 ` Gerald Pfeifer
  2004-09-17 14:50                   ` Jamie Lokier
  1 sibling, 1 reply; 11+ messages in thread
From: Gerald Pfeifer @ 2004-09-17  1:39 UTC (permalink / raw)
  To: Stepan Kasal; +Cc: Andreas Schwab, Karl Berry, gcc, bug-texinfo

On Wed, 15 Sep 2004, Stepan Kasal wrote:
> So, you could have:
> http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements/Lines
> instead of
> http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements_002fLines
>
> Would this help so much?

Is guess that would be somewhat confusing since it looks like a 
subdirectory structure.

Gerald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ability to disable URL mangling in makeinfo 4.7?
  2004-09-17  1:39                 ` Gerald Pfeifer
@ 2004-09-17 14:50                   ` Jamie Lokier
  0 siblings, 0 replies; 11+ messages in thread
From: Jamie Lokier @ 2004-09-17 14:50 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Stepan Kasal, Andreas Schwab, Karl Berry, gcc, bug-texinfo

Gerald Pfeifer wrote:
> >http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements/Lines
> >instead of
> >http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements_002fLines
> >
> >Would this help so much?
> 
> Is guess that would be somewhat confusing since it looks like a 
> subdirectory structure.

Some programs may count the fragment's "/" when resolving
relative URLs such as "..".

I know that is problem with a few old browsers when "/" appears in the
query part of a URL.  I don't know if it's ever a problem when "/"
appears in a fragment identifier.

However, "/" isn't allowed in some kinds of fragment identifiers for
another reason: XML and HTML "id" syntax is restricted.  From HTML 4.01:

   ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
   followed by any number of letters, digits ([0-9]), hyphens ("-"),
   underscores ("_"), colons (":"), and periods (".").

The "id" attribute, which is one way to name target anchors in HTML,
must be an ID token in correct HTML 4.

The "name" attribute, which is another way to name target anchors in
HTML, has CDATA syntax ("name" is not a NAME token, in case the above
quote confused matters).  That syntax is much more flexible.  The
following ASCII characters are fine in that syntax, as are most
non-ASCII characters.

    [-_:.A-Za-z0-9;/?@&=+$,!~*'() <>#%"{}|\\^\[\]`]

Of those, the followed ASCII printable characters and all non-ASCII or
control characters must be %-escaped when they're used in a fragment
reference: %-escape these characters as well as control characters and
non-ASCII characters:

    [ <>#%"{}|\\^\[\]`]

In other words, __in HTML__ (and _not_ XHTML), you can use any
non-control chracter in a "name" anchor, including spaces, "-" and
"*".  You are very restricted with "id" anchors.

This difference is mentioned in the HTML 4 spec, as one reason why you
might choose to use "name" anchors instead of "id".

In XHTML 1.0, the "name" attribute must have NmToken syntax.
In XML 1.0, the "id" attribute must have Name syntax.

It is an amusing set of inconsistencies, which means that if you want
to serve your document as XHTML 1.0, then you can only use these
characters in a "name" anchor (XML NmToken syntax):

    [-_:.A-Za-z0-9] plus some non-ASCII characters (CombiningChar | Extender)

If you want to do something with XML "id" attributes, which is where
XHTML is heading, and you still want the page to by valid HTML, then
you're restricted to the intersection of HTML's constraint on "id"
(the ID token in the first quoted paragraph above) and XML's
constraint on "id" (XML Name syntax: as "name" for XHTML above, with
additional constraints on the first character).

That intersection is strictly:

    [A-Za-z] for first char, followed by [-_:.A-Za-z0-9]

I would stick to that for compatibility with everything including
current HTML and future XML/XHTML - but if you don't care about
XML/XHTML, just HTML, then you can use nearly all printing characters
as anchor names.

Enjoy,
-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-09-17 14:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.BSF.4.61.0408181309370.15562@acrux.dbai.tuwien.ac.at>
2004-09-05 21:12 ` Ability to disable URL mangling in makeinfo 4.7? Gerald Pfeifer
2004-09-05 21:32   ` Karl Berry
2004-09-05 21:47     ` Gerald Pfeifer
2004-09-06  0:34       ` Karl Berry
2004-09-14 23:42         ` Gerald Pfeifer
2004-09-15  0:15           ` Karl Berry
2004-09-15  3:26             ` Andreas Schwab
2004-09-15  7:07               ` Stepan Kasal
2004-09-15  7:59                 ` Eli Zaretskii
2004-09-17  1:39                 ` Gerald Pfeifer
2004-09-17 14:50                   ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).