public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
@ 2022-06-13 19:50 seurer at gcc dot gnu.org
  2022-07-19 21:39 ` [Bug testsuite/105959] " seurer at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: seurer at gcc dot gnu.org @ 2022-06-13 19:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

            Bug ID: 105959
           Summary: new test case
                    c-c++-common/diagnostic-format-sarif-file-4.c from
                    r13-967-g6cf276ddf22066 fails
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: testsuite
          Assignee: unassigned at gcc dot gnu.org
          Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:6cf276ddf22066af780335cd0072d2c27aabe468, r13-967-g6cf276ddf22066
make  -k check-gcc
RUNTESTFLAGS="dg.exp=c-c++-common/diagnostic-format-sarif-file-4.c"
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat  
scan-sarif-file "text": "  int \\u6587\\u5b57\\u5316\\u3051 = 
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++98 
scan-sarif-file "text": "  int \\u6587\\u5b57\\u5316\\u3051 = 
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++14 
scan-sarif-file "text": "  int \\u6587\\u5b57\\u5316\\u3051 = 
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++17 
scan-sarif-file "text": "  int \\u6587\\u5b57\\u5316\\u3051 = 
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++20 
scan-sarif-file "text": "  int \\u6587\\u5b57\\u5316\\u3051 = 
# of expected passes            4
# of expected passes            16
# of unexpected failures        1
# of unexpected failures        4


commit 6cf276ddf22066af780335cd0072d2c27aabe468 (HEAD, refs/bisect/bad)
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Thu Jun 2 15:40:22 2022 -0400

    diagnostics: add SARIF output format

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
@ 2022-07-19 21:39 ` seurer at gcc dot gnu.org
  2022-07-29 17:55 ` danglin at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: seurer at gcc dot gnu.org @ 2022-07-19 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #1 from seurer at gcc dot gnu.org ---
*** Bug 105837 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
  2022-07-19 21:39 ` [Bug testsuite/105959] " seurer at gcc dot gnu.org
@ 2022-07-29 17:55 ` danglin at gcc dot gnu.org
  2023-02-16 16:16 ` hp at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: danglin at gcc dot gnu.org @ 2022-07-29 17:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

John David Anglin <danglin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Build|powerpc64-linux-gnu,        |powerpc64-linux-gnu,
                   |powerpc64le-linux-gnu       |powerpc64le-linux-gnu
                   |                            |hppa-unknown-linux-gnu
               Host|powerpc64-linux-gnu,        |powerpc64-linux-gnu,
                   |powerpc64le-linux-gnu       |powerpc64le-linux-gnu
                   |                            |hppa-unknown-linux-gnu
                 CC|                            |danglin at gcc dot gnu.org
             Target|powerpc64-linux-gnu,        |powerpc64-linux-gnu,
                   |powerpc64le-linux-gnu       |powerpc64le-linux-gnu
                   |                            |hppa-unknown-linux-gnu

--- Comment #2 from John David Anglin <danglin at gcc dot gnu.org> ---
Also fails on hppa:
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat  
scan-sarif-file "text": "  int \\\\u6587\\\\u5b57\\\\u5316\\\\u3051 =

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
  2022-07-19 21:39 ` [Bug testsuite/105959] " seurer at gcc dot gnu.org
  2022-07-29 17:55 ` danglin at gcc dot gnu.org
@ 2023-02-16 16:16 ` hp at gcc dot gnu.org
  2023-03-13 20:53 ` dmalcolm at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: hp at gcc dot gnu.org @ 2023-02-16 16:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

Hans-Peter Nilsson <hp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hp at gcc dot gnu.org

--- Comment #3 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
...and cris-elf and also native x86_64-pc-linux-gnu (Debian 11).
Looking at the test-case and the output makes me think those multibyte
characters and/or the matching of them requires a newer -or older!- dejagnu
and/or expect and/or Tcl.

Here, "runtest --version" shows:
DejaGnu version 1.6.2
Expect version  5.45.4
Tcl version     8.6

Can someone with a machine where this test passes, paste output of that
command?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-02-16 16:16 ` hp at gcc dot gnu.org
@ 2023-03-13 20:53 ` dmalcolm at gcc dot gnu.org
  2023-03-14  2:47 ` hp at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-13 20:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #4 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Created attachment 54653
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54653&action=edit
Generated diagnostic-format-sarif-file-4.c.sarif output file on my machine

(In reply to Hans-Peter Nilsson from comment #3)
> Can someone with a machine where this test passes, paste output of that
> command?

Sorry about this.

The test works for me, with:

                === gcc Summary ===

# of expected passes            5
# of expected failures          1

where:

$ runtest --version

emits:

WARNING: Couldn't find the global config file.
DejaGnu version 1.6.1
Expect version  5.45.4
Tcl version     8.6

What does the generated
  testsuite/gcc/diagnostic-format-sarif-file-4.c.sarif 
file look like?  I'm attaching mine for reference.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-03-13 20:53 ` dmalcolm at gcc dot gnu.org
@ 2023-03-14  2:47 ` hp at gcc dot gnu.org
  2023-03-14  3:04 ` hp at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: hp at gcc dot gnu.org @ 2023-03-14  2:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #5 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
Created attachment 54658
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54658&action=edit
mine, from native build/test of yesterday (see file for exact version).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-03-14  2:47 ` hp at gcc dot gnu.org
@ 2023-03-14  3:04 ` hp at gcc dot gnu.org
  2023-03-16 21:52 ` dmalcolm at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: hp at gcc dot gnu.org @ 2023-03-14  3:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #6 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
(In reply to David Malcolm from comment #4)
> DejaGnu version	1.6.1
> Expect version	5.45.4
> Tcl version	8.6

Close enough to say that's probably *not* it, also see below...

> What does the generated
>   testsuite/gcc/diagnostic-format-sarif-file-4.c.sarif 
> file look like?  I'm attaching mine for reference.

Besides file-paths and version strings, it's the same; attached for reference.
Perhaps your language environment is different?  I always build with
"LC_ALL=C". (everybody speaks the C language - it must be right! ;)

Indeed, if I run the test-suite prefixing the "make" invocation with 'env
"LC_ALL=en_US.UTF-8"' or 'env "LC_ALL=C.UTF-8"' the test passes, so I think we
found the cause; an assumption that the environment speaks UTF-8.  My
environment is ISO-8859-1, and I guess similarly the reporter's and John
Anglin's tester environments.  I'm also guessing you'll see the reflection; the
error I see, if you run the test-suite prefixing your make invocation with 'env
"LC_ALL=C" "LANG=C"'.

It unfortunately makes no difference to add "setenv LC_ALL C" to dg.exp (it's
already set via gcc-dg.exp), so this needs some other tweak to force the
environment to the preferred setting for the test.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-03-14  3:04 ` hp at gcc dot gnu.org
@ 2023-03-16 21:52 ` dmalcolm at gcc dot gnu.org
  2023-03-16 21:58 ` dmalcolm at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 21:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

David Malcolm <dmalcolm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2023-01-30 00:00:00         |2023-03-16
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #7 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Aha! - thanks for the information.

I think GCC is writing out the .sarif file in UTF-8 form regardless of the
environment on everyone's box.  The issue seems to be this line in the testcase
to check for the UTF-8 in the "snippet" output:
       { dg-final { scan-sarif-file "\"text\": \"  int
\\u6587\\u5b57\\u5316\\u3051 = " } }
that's failing somewhere within DejaGnu, presumably due to the environment
differences.

There some variation due to json::object using a hash_map for the key/value
pairs, which means (annoyingly) it outputs things in arbitrary order, leading
to non-determinism in the .sarif content.

Perhaps it's possible to express byte-level matching in Tcl?  I'll have a look.


Details
=======

The source code (gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-4.c)
is indeed UTF-8 encoded; looking at the output of
./contrib/unicode/utf8-dump.py, I see this for line 7:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
   7 |   int 文字化け = *42;
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+0069            0x69                     LATIN SMALL LETTER I i
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+6587  0xe6 0x96 0x87               CJK UNIFIED IDEOGRAPH-6587 文
     |   U+5B57  0xe5 0xad 0x97               CJK UNIFIED IDEOGRAPH-5B57 字
     |   U+5316  0xe5 0x8c 0x96               CJK UNIFIED IDEOGRAPH-5316 化
     |   U+3051  0xe3 0x81 0x91                       HIRAGANA LETTER KE け
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+003D            0x3d                              EQUALS SIGN =
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+002A            0x2a                                 ASTERISK *
     |   U+0034            0x34                               DIGIT FOUR 4
     |   U+0032            0x32                                DIGIT TWO 2
     |   U+003B            0x3b                                SEMICOLON ;
     |   U+000A            0x0a                           LINE FEED (LF)
(control character)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Looking at the output on my box via:
  hexdump -C testsuite/gcc/diagnostic-format-sarif-file-4.c.sarif|less
and looking for "snippet" shows:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
000005a0  3a 20 7b 22 63 6f 6e 74  65 78 74 52 65 67 69 6f  |: {"contextRegio|
000005b0  6e 22 3a 20 7b 22 73 74  61 72 74 4c 69 6e 65 22  |n": {"startLine"|
000005c0  3a 20 37 2c 20 22 73 6e  69 70 70 65 74 22 3a 20  |: 7, "snippet": |
000005d0  7b 22 74 65 78 74 22 3a  20 22 20 20 69 6e 74 20  |{"text": "  int |
000005e0  e6 96 87 e5 ad 97 e5 8c  96 e3 81 91 20 3d 20 2a  |............ = *|
000005f0  34 32 3b 5c 6e 22 7d 7d  2c 20 22 61 72 74 69 66  |42;\n"}}, "artif|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

where it's been encoded in UTF-8 as:
   e6 96 87 e5 ad 97 e5 8c  96 e3 81 91 20 3d
 which I can confirm with ./contrib/unicode/utf8-dump.py, which shows that the
snippet has been written in UTF-8 form:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
     |   U+0069            0x69                     LATIN SMALL LETTER I i
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+6587  0xe6 0x96 0x87               CJK UNIFIED IDEOGRAPH-6587 文
     |   U+5B57  0xe5 0xad 0x97               CJK UNIFIED IDEOGRAPH-5B57 字
     |   U+5316  0xe5 0x8c 0x96               CJK UNIFIED IDEOGRAPH-5316 化
     |   U+3051  0xe3 0x81 0x91                       HIRAGANA LETTER KE け
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+003D            0x3d                              EQUALS SIGN =
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The test case has:
  { dg-final { scan-sarif-file "\"text\": \"  int \\u6587\\u5b57\\u5316\\u3051
= " } }
      which is looking for the text of the snippet containing the unicode chars

Attachment 54658 (with md5sum 67cc5fdbee9006509aa38af635d6cf69) has this for
the snippet:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
000005f0  73 6e 69 70 70 65 74 22  3a 20 7b 22 74 65 78 74  |snippet": {"text|
00000600  22 3a 20 22 20 20 69 6e  74 20 e6 96 87 e5 ad 97  |": "  int ......|
00000610  e5 8c 96 e3 81 91 20 3d  20 2a 34 32 3b 5c 6e 22  |...... = *42;\n"|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      which is:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
     |   U+0069            0x69                     LATIN SMALL LETTER I i
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+6587  0xe6 0x96 0x87               CJK UNIFIED IDEOGRAPH-6587 文
     |   U+5B57  0xe5 0xad 0x97               CJK UNIFIED IDEOGRAPH-5B57 字
     |   U+5316  0xe5 0x8c 0x96               CJK UNIFIED IDEOGRAPH-5316 化
     |   U+3051  0xe3 0x81 0x91                       HIRAGANA LETTER KE け
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+003D            0x3d                              EQUALS SIGN =
     |   U+0020            0x20                                    SPACE
(separator)
     |   U+002A            0x2a                                 ASTERISK *
     |   U+0034            0x34                               DIGIT FOUR 4
     |   U+0032            0x32                                DIGIT TWO 2
     |   U+003B            0x3b                                SEMICOLON ;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Hence GCC is also writing out the .sarif file in UTF-8 form in that attachment,
regardless of the environment; the issue is presumably within the handling of
this directive:
       { dg-final { scan-sarif-file "\"text\": \"  int
\\u6587\\u5b57\\u5316\\u3051 = " } }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-03-16 21:52 ` dmalcolm at gcc dot gnu.org
@ 2023-03-16 21:58 ` dmalcolm at gcc dot gnu.org
  2023-03-16 22:08 ` dmalcolm at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 21:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #8 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Note that section 3.1 ("File Format" > "General") specifies:
  "A SARIF log file SHALL be encoded in UTF-8 [RFC3629]."
https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html

Though I suppose it would be possible to escape non-ASCII chars so that the
.sarif file could use the ASCII subset of UTF-8, if there's no other way around
this from the DejaGnu side.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-03-16 21:58 ` dmalcolm at gcc dot gnu.org
@ 2023-03-16 22:08 ` dmalcolm at gcc dot gnu.org
  2023-03-17  2:55 ` hp at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 22:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #9 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to David Malcolm from comment #7)

[...snip...]

> There some variation due to json::object using a hash_map for the key/value
> pairs, which means (annoyingly) it outputs things in arbitrary order,
> leading to non-determinism in the .sarif content.

I've filed this as PR 109163.

[...snip...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-03-16 22:08 ` dmalcolm at gcc dot gnu.org
@ 2023-03-17  2:55 ` hp at gcc dot gnu.org
  2023-03-17 14:26 ` dmalcolm at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: hp at gcc dot gnu.org @ 2023-03-17  2:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #10 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
(In reply to David Malcolm from comment #8)
> Note that section 3.1 ("File Format" > "General") specifies:
>   "A SARIF log file SHALL be encoded in UTF-8 [RFC3629]."
> https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html
> 
> Though I suppose it would be possible to escape non-ASCII chars so that the
> .sarif file could use the ASCII subset of UTF-8,

ISTM the point of that test is heavy use of UTF-8, so you can't get away with
using the ASCII subset.  (I see an identifier using ideographs?  Wouldn't want
to review that code...  Might as well use Linear A -which you indeed can in
UTF-8- - it's all greek to me!)

> if there's no other way
> around this from the DejaGnu side.

Perhaps add a parameter to dg-scan (it enforces exactly two arguments now) that
scan-sarif-file can use, as it's always UTF-8, making dg-scan apply "fconfigure
$fd -encoding [lindex $orig_args 2]" and the parameter passed as "utf-8" or
something like that, since SARIF files are always UTF-8.  Assuming that works,
of course; completely untested theory.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-03-17  2:55 ` hp at gcc dot gnu.org
@ 2023-03-17 14:26 ` dmalcolm at gcc dot gnu.org
  2023-03-17 15:35 ` dmalcolm at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-17 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #11 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to Hans-Peter Nilsson from comment #10)
> (I see an identifier using ideographs? 
> Wouldn't want to review that code...  Might as well use Linear A -which you
> indeed can in UTF-8- - it's all greek to me!)

FWIW the identifier "文字化け" is the word "mojibake", which is the Japanese word
for snafu with character encodings:
  https://en.wikipedia.org/wiki/Mojibake

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-03-17 14:26 ` dmalcolm at gcc dot gnu.org
@ 2023-03-17 15:35 ` dmalcolm at gcc dot gnu.org
  2023-03-17 18:34 ` dmalcolm at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-17 15:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #12 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Thanks for the ideas.  If I hack in the following into dg-scan (to force the
scanned file to be treated as UTF-8 as it is read), then the existing case
works with both:
  LC_ALL=C
  LC_ALL=en_US.UTF-8

so perhaps I can do this just for scan-sarif-file

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 4b018abcf3d..828002bf6e1 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -59,6 +59,7 @@ proc dg-scan { name positive testcase output_file orig_args }
{
        return
     }
     set fd [open $output_file r]
+    fconfigure $fd -encoding utf-8
     set text [read $fd]
     close $fd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-03-17 15:35 ` dmalcolm at gcc dot gnu.org
@ 2023-03-17 18:34 ` dmalcolm at gcc dot gnu.org
  2023-03-18  0:57 ` hp at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-17 18:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #13 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Created attachment 54698
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54698&action=edit
Patch that I'm about to put through full testing

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-03-17 18:34 ` dmalcolm at gcc dot gnu.org
@ 2023-03-18  0:57 ` hp at gcc dot gnu.org
  2023-03-20 22:10 ` dmalcolm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: hp at gcc dot gnu.org @ 2023-03-18  0:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #14 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
(In reply to David Malcolm from comment #13)
> Created attachment 54698 [details]
> Patch that I'm about to put through full testing

(and of course there was an additional hurdle to DTRT for the new argument,
heh)
Yes, like that, LGTM, thanks!
(not an approver)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-03-18  0:57 ` hp at gcc dot gnu.org
@ 2023-03-20 22:10 ` dmalcolm at gcc dot gnu.org
  2023-03-22 20:50 ` cvs-commit at gcc dot gnu.org
  2023-03-22 20:54 ` dmalcolm at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-20 22:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

David Malcolm <dmalcolm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
                URL|                            |https://gcc.gnu.org/piperma
                   |                            |il/gcc-patches/2023-March/6
                   |                            |14288.html
             Status|ASSIGNED                    |WAITING

--- Comment #15 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
The patch needed a little tweaking to avoid regressing
gcc.dg-selftests/dg-final.exp; I've posted it here for review:
  [PATCH] testsuite: always use UTF-8 in scan-sarif-file[-not] [PR105959]
     https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614288.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-03-20 22:10 ` dmalcolm at gcc dot gnu.org
@ 2023-03-22 20:50 ` cvs-commit at gcc dot gnu.org
  2023-03-22 20:54 ` dmalcolm at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-22 20:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:6b2740946d26ffde7e1318f24bae00443ece387d

commit r13-6815-g6b2740946d26ffde7e1318f24bae00443ece387d
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Wed Mar 22 16:48:27 2023 -0400

    testsuite: always use UTF-8 in scan-sarif-file[-not] [PR105959]

    c-c++-common/diagnostic-format-sarif-file-4.c is a test case for
    quoting non-ASCII source code in a SARIF diagnostic log.

    The SARIF standard mandates that .sarif files are UTF-8 encoded.

    PR testsuite/105959 notes that the test case fails when the system
    encoding is not UTF-8, such as when the "make" invocation is prefixed
    with LC_ALL=C, whereas it works with in a UTF-8-locale.

    The root cause is that dg-scan opens the file for reading using the
    "system" encoding; I believe it is falling back to treating all files as
    effectively ISO 8859-1 in a non-UTF-8 locale.

    This patch fixes things by adding a mechanism to dg-scan to allow
    callers to (optionally) specify an encoding to use when reading the
    file, and updating scan-sarif-file (and the -not variant) to always
    use UTF-8 when calling dg-scan, fixing the test case with LC_ALL=C.

    gcc/testsuite/ChangeLog:
            PR testsuite/105959
            * gcc.dg-selftests/dg-final.exp
            (dg_final_directive_check_num_args): Update expected maximum
            number of args for the various directives using dg-scan.
            * lib/scanasm.exp (append_encoding_arg): New procedure.
            (dg-scan): Add optional 3rd argument: the encoding to use when
            reading from the file.
            * lib/scansarif.exp (scan-sarif-file): Treat the file as UTF-8
            encoded when reading it.
            (scan-sarif-file-not): Likewise.

    Signed-off-by: David Malcolm <dmalcolm@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug testsuite/105959] new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails
  2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-03-22 20:50 ` cvs-commit at gcc dot gnu.org
@ 2023-03-22 20:54 ` dmalcolm at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-22 20:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105959

David Malcolm <dmalcolm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|WAITING                     |RESOLVED

--- Comment #17 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Should be fixed by the above commit

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-03-22 20:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13 19:50 [Bug testsuite/105959] New: new test case c-c++-common/diagnostic-format-sarif-file-4.c from r13-967-g6cf276ddf22066 fails seurer at gcc dot gnu.org
2022-07-19 21:39 ` [Bug testsuite/105959] " seurer at gcc dot gnu.org
2022-07-29 17:55 ` danglin at gcc dot gnu.org
2023-02-16 16:16 ` hp at gcc dot gnu.org
2023-03-13 20:53 ` dmalcolm at gcc dot gnu.org
2023-03-14  2:47 ` hp at gcc dot gnu.org
2023-03-14  3:04 ` hp at gcc dot gnu.org
2023-03-16 21:52 ` dmalcolm at gcc dot gnu.org
2023-03-16 21:58 ` dmalcolm at gcc dot gnu.org
2023-03-16 22:08 ` dmalcolm at gcc dot gnu.org
2023-03-17  2:55 ` hp at gcc dot gnu.org
2023-03-17 14:26 ` dmalcolm at gcc dot gnu.org
2023-03-17 15:35 ` dmalcolm at gcc dot gnu.org
2023-03-17 18:34 ` dmalcolm at gcc dot gnu.org
2023-03-18  0:57 ` hp at gcc dot gnu.org
2023-03-20 22:10 ` dmalcolm at gcc dot gnu.org
2023-03-22 20:50 ` cvs-commit at gcc dot gnu.org
2023-03-22 20:54 ` dmalcolm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).