public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* ada/6726: -gnaty miscounts characters in UTF-8 source text
@ 2002-05-19 15:06 starner
  0 siblings, 0 replies; 4+ messages in thread
From: starner @ 2002-05-19 15:06 UTC (permalink / raw)
  To: gcc-gnats

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]


>Number:         6726
>Category:       ada
>Synopsis:       -gnaty miscounts characters in UTF-8 source text
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    unassigned
>State:          open
>Class:          rejects-legal
>Submitter-Id:   net
>Arrival-Date:   Sun May 19 15:06:01 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     starner@okstate.edu
>Release:        gcc-3.1
>Organization:
>Environment:
Debian woody; ix86-linux
>Description:
-gnaty only permits 80 characters per line. However, if those characters are encoded in UTF-8, they are miscounted, and shorter lines are rejected. The Cherokee_String line in the attached file is 71 characters long, but 99 bytes long, and is rejected.
>How-To-Repeat:
Run gnatmake -gnaty test_strings.ads. It will return a "(style) This line is too long." error, when in fact this line is not too long.
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:
----gnatsweb-attachment----
Content-Type: text/plain; name="test_strings.ads"
Content-Disposition: inline; filename="test_strings.ads"

package Test_Strings is

   Cherokee_String : constant Wide_String := "ᎠᏆᏖᏁᏙᎽ ᎠᎽᏍᏉᏟ ᏦᏰᎾ       ";

end Test_Strings;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ada/6726: -gnaty miscounts characters in UTF-8 source text
@ 2002-12-14 19:26 starner
  0 siblings, 0 replies; 4+ messages in thread
From: starner @ 2002-12-14 19:26 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR ada/6726; it has been noted by GNATS.

From: starner@okstate.edu
To: bosch@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
   nobody@gcc.gnu.org, starner@okstate.edu, gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: ada/6726: -gnaty miscounts characters in UTF-8 source text
Date: Sat, 14 Dec 2002 21:19:12 -0600 (CST)

 >State-Changed-From-To: open->closed
 [...]
 >    The line length limitation of -gnaty switch is intended to make sure that all lines fit on a regular terminal screen, so that the source can be viewed without problems on all screens. Another issue is that not all wide characters necessarily are the same width: many Asian fixed-spacing terminal fonts use double-width characters for certain glyphs.
 
 You've stated the problems involved in getting this to work completely right.
 What about actually fixing the bug in some way? You could
 
 * Drop in a wcwidth implementation. Markus Kuhn has a wcwidth implementation
 in a page or two of code.
 
 * Just calling all non-ASCII Unicode characters single width or double 
 width. It's a better approximation then the triple width, which most 
 UTF-8 characters are counted as and which is never actually true.
 
 * Disabling character counts for lines on which non-ASCII characters appear,
 possibly with a warning. It's ugly, and the warning is probably overkill,
 but it works.
 
 * Documenting it would be a good start. The current documentation says
 
 If the ^letter m^word LINE_LENGTH^ appears in the string after @option{-gnaty}
 then the length of source lines must not exceed 79 characters, including
 any trailing blanks. The value of 79 allows convenient display on an
 80 character wide device or window, allowing for possible special
 treatment of 80 character lines.
 
 If you chose to interpret this as a documentation error instead of a
 code error, then so be it. But it's clearly one or the other or both.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ada/6726: -gnaty miscounts characters in UTF-8 source text
@ 2002-12-14 13:11 bosch
  0 siblings, 0 replies; 4+ messages in thread
From: bosch @ 2002-12-14 13:11 UTC (permalink / raw)
  To: gcc-bugs, gcc-prs, nobody, starner

Synopsis: -gnaty miscounts characters in UTF-8 source text

State-Changed-From-To: open->closed
State-Changed-By: bosch
State-Changed-When: Sat Dec 14 13:11:32 2002
State-Changed-Why:
    The line length limitation of -gnaty switch is intended to make sure that all lines fit on a regular terminal screen, so that the source can be viewed without problems on all screens. Another issue is that not all wide characters necessarily are the same width: many Asian fixed-spacing terminal fonts use double-width characters for certain glyphs.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=6726


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ada/6726: -gnaty miscounts characters in UTF-8 source text
@ 2002-05-19 15:16 starner
  0 siblings, 0 replies; 4+ messages in thread
From: starner @ 2002-05-19 15:16 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR ada/6726; it has been noted by GNATS.

From: starner@okstate.edu
To: gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: ada/6726: -gnaty miscounts characters in UTF-8 source text
Date: Sun, 19 May 2002 17:15:10 -0500 (CDT)

 You also have to add -gnatW8 so gnat recognizes the text is UTF-8. It still miscounts it after the correct encoding is marked.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-12-15  3:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-19 15:06 ada/6726: -gnaty miscounts characters in UTF-8 source text starner
2002-05-19 15:16 starner
2002-12-14 13:11 bosch
2002-12-14 19:26 starner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).