public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8'
@ 2021-02-22 22:40 leo at sai dot msu.ru
  2021-02-22 22:44 ` [Bug libfortran/99210] " leo at sai dot msu.ru
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: leo at sai dot msu.ru @ 2021-02-22 22:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

            Bug ID: 99210
           Summary: X editing for reading file with encoding='utf-8'
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libfortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: leo at sai dot msu.ru
  Target Milestone: ---

Created attachment 50237
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50237&action=edit
Example bug 1x editing for utf-8 encoding file

nX edit must skip n characters. For files encoding='utf-8', skip n ISO/IEC
10646 characters.

But now, nX edit is skipping n bytes. And may cause "Fortran runtime error:
Invalid UTF-8 encoding" for valid UTF-8 files.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
@ 2021-02-22 22:44 ` leo at sai dot msu.ru
  2021-02-23  1:34 ` jvdelisle at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: leo at sai dot msu.ru @ 2021-02-22 22:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

Serguei E. Leontiev <leo at sai dot msu.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |leo at sai dot msu.ru

--- Comment #1 from Serguei E. Leontiev <leo at sai dot msu.ru> ---
Created attachment 50239
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50239&action=edit
Output and error of example

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
  2021-02-22 22:44 ` [Bug libfortran/99210] " leo at sai dot msu.ru
@ 2021-02-23  1:34 ` jvdelisle at gcc dot gnu.org
  2021-02-28  3:25 ` jvdelisle at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2021-02-23  1:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-02-23
                 CC|                            |jvdelisle at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jvdelisle at gcc dot gnu.org

--- Comment #2 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
I will take this one.  I need to investigate a bit, I can reproduce the results
shown.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
  2021-02-22 22:44 ` [Bug libfortran/99210] " leo at sai dot msu.ru
  2021-02-23  1:34 ` jvdelisle at gcc dot gnu.org
@ 2021-02-28  3:25 ` jvdelisle at gcc dot gnu.org
  2021-04-17  3:26 ` jvdelisle at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2021-02-28  3:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

--- Comment #3 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Here is the real issue. The X format specifier is a position modifier. UTF-8 is
a variable character length encoding so moving one character could mean move 1,
2, 3, or 4 bytes depending on the content of the file.

Up to now we have chosen to move "position" by 1 byte.

13.8.1.1 Position editing

1 The position edit descriptors T, TL, TR, and X, specify the position at which
the next character will be transmitted to or from the record. If any character
skipped by a position edit descriptor is of type nondefault character,
and the unit is a default character internal file or an external non-Unicode
file, the result of that position editing is processor dependent.

Our interpretation of this has been that the example provided in this PR is
processor dependent. However, the file is opened as encoding='UTF-8'.

So, we have to use UTF-8 based skips for READs.  The following patch does this:

diff --git a/libgfortran/io/read.c b/libgfortran/io/read.c
index 7515d912c51..30ff0e0deb7 100644
--- a/libgfortran/io/read.c
+++ b/libgfortran/io/read.c
@@ -1255,6 +1255,23 @@ read_x (st_parameter_dt *dtp, size_t n)

   if (n == 0)
     return;
+    
+  if (dtp->u.p.current_unit->flags.encoding == ENCODING_UTF8)
+    {
+      gfc_char4_t c;
+      size_t nbytes, j;
+    
+      /* Proceed with decoding one character at a time.  */
+      for (j = 0; j < n; j++)
+       {
+         c = read_utf8 (dtp, &nbytes);
+    
+         /* Check for a short read and if so, break out.  */
+         if (nbytes == 0 || c == (gfc_char4_t)0)
+           break;
+       }
+      return;
+    }

   length = n;

The remaining part of this is what to do for end of file conditions.  So, I am
doing a little mor testing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (2 preceding siblings ...)
  2021-02-28  3:25 ` jvdelisle at gcc dot gnu.org
@ 2021-04-17  3:26 ` jvdelisle at gcc dot gnu.org
  2021-05-04  2:29 ` jvdelisle at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2021-04-17  3:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (3 preceding siblings ...)
  2021-04-17  3:26 ` jvdelisle at gcc dot gnu.org
@ 2021-05-04  2:29 ` jvdelisle at gcc dot gnu.org
  2024-02-13  5:00 ` jvdelisle at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2021-05-04  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

--- Comment #4 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
I think the patch works fine as is as far as I can tell. There will be a
similar fix for writing files with encoding='utf8'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (4 preceding siblings ...)
  2021-05-04  2:29 ` jvdelisle at gcc dot gnu.org
@ 2024-02-13  5:00 ` jvdelisle at gcc dot gnu.org
  2024-02-13 20:07 ` jvdelisle at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2024-02-13  5:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |jvdelisle at gcc dot gnu.org
                 CC|                            |jvdelisle at gcc dot gnu.org

--- Comment #5 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
I need to keep an eye on this while working on related issues.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (5 preceding siblings ...)
  2024-02-13  5:00 ` jvdelisle at gcc dot gnu.org
@ 2024-02-13 20:07 ` jvdelisle at gcc dot gnu.org
  2024-02-13 22:56 ` jvdelisle at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2024-02-13 20:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

--- Comment #6 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
I have reapplied the patch in comment #3 and it regression tests fine and
appears to fix the issue. I have need to work up the test case and submit this
for approval.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (6 preceding siblings ...)
  2024-02-13 20:07 ` jvdelisle at gcc dot gnu.org
@ 2024-02-13 22:56 ` jvdelisle at gcc dot gnu.org
  2024-02-14 15:58 ` cvs-commit at gcc dot gnu.org
  2024-02-14 20:00 ` jvdelisle at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2024-02-13 22:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

--- Comment #7 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Submitted for approval.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (7 preceding siblings ...)
  2024-02-13 22:56 ` jvdelisle at gcc dot gnu.org
@ 2024-02-14 15:58 ` cvs-commit at gcc dot gnu.org
  2024-02-14 20:00 ` jvdelisle at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-14 15:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jerry DeLisle <jvdelisle@gcc.gnu.org>:

https://gcc.gnu.org/g:b79d3e6a9284703b70688122f7d4955e7c50804a

commit r14-8983-gb79d3e6a9284703b70688122f7d4955e7c50804a
Author: Jerry DeLisle <jvdelisle@gcc.gnu.org>
Date:   Tue Feb 13 14:32:21 2024 -0800

    Fortran: Implement read_x for UTF-8 encoded files.

            PR fortran/99210

    libgfortran/ChangeLog:

            * io/read.c (read_x): If UTF-8 encoding is enabled, use
            read_utf8 to move one character over in the read buffer.

    gcc/testsuite/ChangeLog:

            * gfortran.dg/pr99210.f90: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libfortran/99210] X editing for reading file with encoding='utf-8'
  2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
                   ` (8 preceding siblings ...)
  2024-02-14 15:58 ` cvs-commit at gcc dot gnu.org
@ 2024-02-14 20:00 ` jvdelisle at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2024-02-14 20:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99210

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Fixed on main line. If someone needs a backport, let me know.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-14 20:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 22:40 [Bug libfortran/99210] New: X editing for reading file with encoding='utf-8' leo at sai dot msu.ru
2021-02-22 22:44 ` [Bug libfortran/99210] " leo at sai dot msu.ru
2021-02-23  1:34 ` jvdelisle at gcc dot gnu.org
2021-02-28  3:25 ` jvdelisle at gcc dot gnu.org
2021-04-17  3:26 ` jvdelisle at gcc dot gnu.org
2021-05-04  2:29 ` jvdelisle at gcc dot gnu.org
2024-02-13  5:00 ` jvdelisle at gcc dot gnu.org
2024-02-13 20:07 ` jvdelisle at gcc dot gnu.org
2024-02-13 22:56 ` jvdelisle at gcc dot gnu.org
2024-02-14 15:58 ` cvs-commit at gcc dot gnu.org
2024-02-14 20:00 ` jvdelisle at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).