* Susprising behavior of gcc on x86 (-m32) @ 2015-09-08 9:15 Mathieu Malaterre 2015-09-08 9:45 ` Andrew Haley 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 9:15 UTC (permalink / raw) To: gcc-help Dear all, I am trying to track down the following OpenJPEG regression on x86 (amd64 is fine): https://github.com/uclouvain/openjpeg/issues/571 In summary, if I compile OpenJPEG (git/master) using gcc (all gcc versions in debian: 4.8, 4.9 and 5.2 are affected) and try to compress a specific input file the generated J2K file is invalid. The bug is somewhat problematic within OpenJPEG because it make lossless compression: lossy ! I have not been able to reproduce this behavior using clang 3.5 (again debian/sid chroot 32bits), and I am not able to reproduce this behavior from an amd64 debian sid chroot. What is even more surprising is that I can no longer reproduce the behavior using `valgrind` from my 32bits chroot. I understand that my bug description is relatively small, but I am eager to report a more specific gcc issue. If anyone could help me narrow down this issue, I'd appreciate your comments. Please note that that I disabled any kind of optimizations by using (explicitly!) -O0. Thanks ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 9:15 Susprising behavior of gcc on x86 (-m32) Mathieu Malaterre @ 2015-09-08 9:45 ` Andrew Haley 2015-09-08 9:57 ` Mathieu Malaterre 0 siblings, 1 reply; 14+ messages in thread From: Andrew Haley @ 2015-09-08 9:45 UTC (permalink / raw) To: gcc-help On 09/08/2015 10:15 AM, Mathieu Malaterre wrote: > What is even more surprising is that I can no longer reproduce the > behavior using `valgrind` from my 32bits chroot. > > I understand that my bug description is relatively small, but I am > eager to report a more specific gcc issue. If anyone could help me > narrow down this issue, I'd appreciate your comments. > > Please note that that I disabled any kind of optimizations by using > (explicitly!) -O0. I'm guessing that it's some silliness with the FPU, but that's a wild guess. You first need to find out what part of the file is different, and narrow it down from there. One question: how well do you understand the OpenJPEG code base? Andrew. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 9:45 ` Andrew Haley @ 2015-09-08 9:57 ` Mathieu Malaterre 2015-09-08 10:05 ` Jonathan Wakely 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 9:57 UTC (permalink / raw) To: Andrew Haley; +Cc: gcc-help On Tue, Sep 8, 2015 at 11:45 AM, Andrew Haley <aph@redhat.com> wrote: > On 09/08/2015 10:15 AM, Mathieu Malaterre wrote: >> What is even more surprising is that I can no longer reproduce the >> behavior using `valgrind` from my 32bits chroot. >> >> I understand that my bug description is relatively small, but I am >> eager to report a more specific gcc issue. If anyone could help me >> narrow down this issue, I'd appreciate your comments. >> >> Please note that that I disabled any kind of optimizations by using >> (explicitly!) -O0. > > I'm guessing that it's some silliness with the FPU, but that's a wild > guess. Technically this code path is *not* using floating point at all (by JPEG 2000 reversible kernel design). integer based shift&additions operations only. > You first need to find out what part of the file is different, and > narrow it down from there. One question: how well do you understand > the OpenJPEG code base? Let me answer it this way: this is a huge task -for me- to narrow down this issue to a minimal C code. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 9:57 ` Mathieu Malaterre @ 2015-09-08 10:05 ` Jonathan Wakely 2015-09-08 10:15 ` Mathieu Malaterre ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Jonathan Wakely @ 2015-09-08 10:05 UTC (permalink / raw) To: Mathieu Malaterre; +Cc: Andrew Haley, gcc-help On 8 September 2015 at 10:57, Mathieu Malaterre <malat@debian.org> wrote: > On Tue, Sep 8, 2015 at 11:45 AM, Andrew Haley <aph@redhat.com> wrote: >> On 09/08/2015 10:15 AM, Mathieu Malaterre wrote: >>> What is even more surprising is that I can no longer reproduce the >>> behavior using `valgrind` from my 32bits chroot. >>> >>> I understand that my bug description is relatively small, but I am >>> eager to report a more specific gcc issue. If anyone could help me >>> narrow down this issue, I'd appreciate your comments. >>> >>> Please note that that I disabled any kind of optimizations by using >>> (explicitly!) -O0. >> >> I'm guessing that it's some silliness with the FPU, but that's a wild >> guess. That was my first thought too. To rule it out you could compile with -mfpmath=sse > Technically this code path is *not* using floating point at all (by > JPEG 2000 reversible kernel design). integer based shift&additions > operations only. Then I suggest compiling with -fsanitize=undefined to see if there are any undefined shifts. You could try that with both GCC and Clang, they both support UBsan with slightly different feature sets. >> You first need to find out what part of the file is different, and >> narrow it down from there. One question: how well do you understand >> the OpenJPEG code base? > > Let me answer it this way: this is a huge task -for me- to narrow down > this issue to a minimal C code. That's not the only option. You could compile one file with GCC and all others with Clang and see if you can reproduce it. Repeat for each file, which will narrow down the file where the problem occurs. Then you can try splitting that file into smaller pieces, with one function per file, and repeat the process. That would tell you which function or functions get miscompiled by GCC. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 10:05 ` Jonathan Wakely @ 2015-09-08 10:15 ` Mathieu Malaterre 2015-09-08 10:21 ` Mathieu Malaterre 2015-09-08 12:00 ` Mathieu Malaterre 2 siblings, 0 replies; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 10:15 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Andrew Haley, gcc-help On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > On 8 September 2015 at 10:57, Mathieu Malaterre <malat@debian.org> wrote: >> On Tue, Sep 8, 2015 at 11:45 AM, Andrew Haley <aph@redhat.com> wrote: >>> On 09/08/2015 10:15 AM, Mathieu Malaterre wrote: >>>> What is even more surprising is that I can no longer reproduce the >>>> behavior using `valgrind` from my 32bits chroot. >>>> >>>> I understand that my bug description is relatively small, but I am >>>> eager to report a more specific gcc issue. If anyone could help me >>>> narrow down this issue, I'd appreciate your comments. >>>> >>>> Please note that that I disabled any kind of optimizations by using >>>> (explicitly!) -O0. >>> >>> I'm guessing that it's some silliness with the FPU, but that's a wild >>> guess. > > That was my first thought too. To rule it out you could compile with > -mfpmath=sse > > >> Technically this code path is *not* using floating point at all (by >> JPEG 2000 reversible kernel design). integer based shift&additions >> operations only. > > Then I suggest compiling with -fsanitize=undefined to see if there are > any undefined shifts. > > You could try that with both GCC and Clang, they both support UBsan > with slightly different feature sets. > >>> You first need to find out what part of the file is different, and >>> narrow it down from there. One question: how well do you understand >>> the OpenJPEG code base? >> >> Let me answer it this way: this is a huge task -for me- to narrow down >> this issue to a minimal C code. > > That's not the only option. You could compile one file with GCC and > all others with Clang and see if you can reproduce it. Repeat for each > file, which will narrow down the file where the problem occurs. Then > you can try splitting that file into smaller pieces, with one function > per file, and repeat the process. That would tell you which function > or functions get miscompiled by GCC. Ah ! Thanks much for the info. I was only starring at: https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction I'll give it a shot ASAP. Thx again. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 10:05 ` Jonathan Wakely 2015-09-08 10:15 ` Mathieu Malaterre @ 2015-09-08 10:21 ` Mathieu Malaterre 2015-09-08 10:36 ` Jonathan Wakely 2015-09-08 12:00 ` Mathieu Malaterre 2 siblings, 1 reply; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 10:21 UTC (permalink / raw) To: Jonathan Wakely; +Cc: gcc-help On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > On 8 September 2015 at 10:57, Mathieu Malaterre <malat@debian.org> wrote: >> On Tue, Sep 8, 2015 at 11:45 AM, Andrew Haley <aph@redhat.com> wrote: >>> On 09/08/2015 10:15 AM, Mathieu Malaterre wrote: >>>> What is even more surprising is that I can no longer reproduce the >>>> behavior using `valgrind` from my 32bits chroot. >>>> >>>> I understand that my bug description is relatively small, but I am >>>> eager to report a more specific gcc issue. If anyone could help me >>>> narrow down this issue, I'd appreciate your comments. >>>> >>>> Please note that that I disabled any kind of optimizations by using >>>> (explicitly!) -O0. >>> >>> I'm guessing that it's some silliness with the FPU, but that's a wild >>> guess. > > That was my first thought too. To rule it out you could compile with > -mfpmath=sse > > >> Technically this code path is *not* using floating point at all (by >> JPEG 2000 reversible kernel design). integer based shift&additions >> operations only. > > Then I suggest compiling with -fsanitize=undefined to see if there are > any undefined shifts. [...] /home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/t1.c:1517:28: runtime error: left shift of negative value -128 [...] You've saved me hours of time ! Thanks. for reference: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/t1.c#L1517 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 10:21 ` Mathieu Malaterre @ 2015-09-08 10:36 ` Jonathan Wakely 2015-09-08 11:38 ` Mathieu Malaterre 0 siblings, 1 reply; 14+ messages in thread From: Jonathan Wakely @ 2015-09-08 10:36 UTC (permalink / raw) To: Mathieu Malaterre; +Cc: gcc-help On 8 September 2015 at 11:20, Mathieu Malaterre wrote: > > [...] > /home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/t1.c:1517:28: > runtime error: left shift of negative value -128 > [...] > > You've saved me hours of time ! Thanks. > > for reference: > https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/t1.c#L1517 UBsan for the win! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 10:36 ` Jonathan Wakely @ 2015-09-08 11:38 ` Mathieu Malaterre 0 siblings, 0 replies; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 11:38 UTC (permalink / raw) To: gcc-help On Tue, Sep 8, 2015 at 12:36 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > On 8 September 2015 at 11:20, Mathieu Malaterre wrote: >> >> [...] >> /home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/t1.c:1517:28: >> runtime error: left shift of negative value -128 >> [...] >> >> You've saved me hours of time ! Thanks. >> >> for reference: >> https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/t1.c#L1517 > > UBsan for the win! that sad news is that replacing: tiledp[tileIndex] <<= T1_NMSEDEC_FRACBITS; with tiledp[tileIndex] *= (1 << T1_NMSEDEC_FRACBITS); does clear out the runtime warning, but the bug is still there :( ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 10:05 ` Jonathan Wakely 2015-09-08 10:15 ` Mathieu Malaterre 2015-09-08 10:21 ` Mathieu Malaterre @ 2015-09-08 12:00 ` Mathieu Malaterre 2015-09-08 12:40 ` Mathieu Malaterre 2 siblings, 1 reply; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 12:00 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Andrew Haley, gcc-help FYI, On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: [...] > That's not the only option. You could compile one file with GCC and > all others with Clang and see if you can reproduce it. Repeat for each > file, which will narrow down the file where the problem occurs. Then > you can try splitting that file into smaller pieces, with one function > per file, and repeat the process. That would tell you which function > or functions get miscompiled by GCC. Ok so if I compile eveything with gcc and then only `tcd.c` using clang, then everything works as expected (no symptoms). ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c I'll repeat your approach to find the culprit function. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 12:00 ` Mathieu Malaterre @ 2015-09-08 12:40 ` Mathieu Malaterre 2015-09-08 13:16 ` Markus Trippelsdorf ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 12:40 UTC (permalink / raw) To: Jonathan Wakely; +Cc: gcc-help On Tue, Sep 8, 2015 at 2:00 PM, Mathieu Malaterre <malat@debian.org> wrote: > FYI, > > On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > [...] >> That's not the only option. You could compile one file with GCC and >> all others with Clang and see if you can reproduce it. Repeat for each >> file, which will narrow down the file where the problem occurs. Then >> you can try splitting that file into smaller pieces, with one function >> per file, and repeat the process. That would tell you which function >> or functions get miscompiled by GCC. > > Ok so if I compile eveything with gcc and then only `tcd.c` using > clang, then everything works as expected (no symptoms). > ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c > > I'll repeat your approach to find the culprit function. And the culprit function is `opj_tcd_makelayer`: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218 Other than the `if (dd / dr >= thresh)` I do not see anything obviously suspicious. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 12:40 ` Mathieu Malaterre @ 2015-09-08 13:16 ` Markus Trippelsdorf 2015-09-08 13:19 ` Mathieu Malaterre 2015-09-08 13:17 ` Mathieu Malaterre 2015-09-08 13:23 ` Andrew Haley 2 siblings, 1 reply; 14+ messages in thread From: Markus Trippelsdorf @ 2015-09-08 13:16 UTC (permalink / raw) To: Mathieu Malaterre; +Cc: Jonathan Wakely, gcc-help On 2015.09.08 at 14:40 +0200, Mathieu Malaterre wrote: > On Tue, Sep 8, 2015 at 2:00 PM, Mathieu Malaterre <malat@debian.org> wrote: > > FYI, > > > > On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > > [...] > >> That's not the only option. You could compile one file with GCC and > >> all others with Clang and see if you can reproduce it. Repeat for each > >> file, which will narrow down the file where the problem occurs. Then > >> you can try splitting that file into smaller pieces, with one function > >> per file, and repeat the process. That would tell you which function > >> or functions get miscompiled by GCC. > > > > Ok so if I compile eveything with gcc and then only `tcd.c` using > > clang, then everything works as expected (no symptoms). > > ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c > > > > I'll repeat your approach to find the culprit function. > > And the culprit function is `opj_tcd_makelayer`: > > https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218 > > Other than the `if (dd / dr >= thresh)` I do not see anything > obviously suspicious. Looks like a x87 vs. SSE2 issue. You could try adding "-msse2 -mfpmath=sse" for the -m32 case. -- Markus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 13:16 ` Markus Trippelsdorf @ 2015-09-08 13:19 ` Mathieu Malaterre 0 siblings, 0 replies; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 13:19 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: gcc-help On Tue, Sep 8, 2015 at 3:16 PM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > On 2015.09.08 at 14:40 +0200, Mathieu Malaterre wrote: >> On Tue, Sep 8, 2015 at 2:00 PM, Mathieu Malaterre <malat@debian.org> wrote: >> > FYI, >> > >> > On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: >> > [...] >> >> That's not the only option. You could compile one file with GCC and >> >> all others with Clang and see if you can reproduce it. Repeat for each >> >> file, which will narrow down the file where the problem occurs. Then >> >> you can try splitting that file into smaller pieces, with one function >> >> per file, and repeat the process. That would tell you which function >> >> or functions get miscompiled by GCC. >> > >> > Ok so if I compile eveything with gcc and then only `tcd.c` using >> > clang, then everything works as expected (no symptoms). >> > ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c >> > >> > I'll repeat your approach to find the culprit function. >> >> And the culprit function is `opj_tcd_makelayer`: >> >> https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218 >> >> Other than the `if (dd / dr >= thresh)` I do not see anything >> obviously suspicious. > > Looks like a x87 vs. SSE2 issue. You could try adding "-msse2 > -mfpmath=sse" for the -m32 case. Indeed that fixes the symptoms. I'll check with upstream how best to rewrite the floating point comparison. Thx for the help everyone. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 12:40 ` Mathieu Malaterre 2015-09-08 13:16 ` Markus Trippelsdorf @ 2015-09-08 13:17 ` Mathieu Malaterre 2015-09-08 13:23 ` Andrew Haley 2 siblings, 0 replies; 14+ messages in thread From: Mathieu Malaterre @ 2015-09-08 13:17 UTC (permalink / raw) To: gcc-help On Tue, Sep 8, 2015 at 2:40 PM, Mathieu Malaterre <malat@debian.org> wrote: > On Tue, Sep 8, 2015 at 2:00 PM, Mathieu Malaterre <malat@debian.org> wrote: >> FYI, >> >> On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: >> [...] >>> That's not the only option. You could compile one file with GCC and >>> all others with Clang and see if you can reproduce it. Repeat for each >>> file, which will narrow down the file where the problem occurs. Then >>> you can try splitting that file into smaller pieces, with one function >>> per file, and repeat the process. That would tell you which function >>> or functions get miscompiled by GCC. >> >> Ok so if I compile eveything with gcc and then only `tcd.c` using >> clang, then everything works as expected (no symptoms). >> ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c >> >> I'll repeat your approach to find the culprit function. > > And the culprit function is `opj_tcd_makelayer`: > > https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218 > > Other than the `if (dd / dr >= thresh)` I do not see anything > obviously suspicious. The diff is that for GCC 5.2: 61.414308 / 8 = 7.676789 For Clang 3.5: 61.414308 / 8 = 7.676788 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Susprising behavior of gcc on x86 (-m32) 2015-09-08 12:40 ` Mathieu Malaterre 2015-09-08 13:16 ` Markus Trippelsdorf 2015-09-08 13:17 ` Mathieu Malaterre @ 2015-09-08 13:23 ` Andrew Haley 2 siblings, 0 replies; 14+ messages in thread From: Andrew Haley @ 2015-09-08 13:23 UTC (permalink / raw) To: gcc-help On 09/08/2015 01:40 PM, Mathieu Malaterre wrote: > On Tue, Sep 8, 2015 at 2:00 PM, Mathieu Malaterre <malat@debian.org> wrote: >> FYI, >> >> On Tue, Sep 8, 2015 at 12:04 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: >> [...] >>> That's not the only option. You could compile one file with GCC and >>> all others with Clang and see if you can reproduce it. Repeat for each >>> file, which will narrow down the file where the problem occurs. Then >>> you can try splitting that file into smaller pieces, with one function >>> per file, and repeat the process. That would tell you which function >>> or functions get miscompiled by GCC. >> >> Ok so if I compile eveything with gcc and then only `tcd.c` using >> clang, then everything works as expected (no symptoms). >> ref: https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c >> >> I'll repeat your approach to find the culprit function. > > And the culprit function is `opj_tcd_makelayer`: > > https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218 > > Other than the `if (dd / dr >= thresh)` I do not see anything > obviously suspicious. I see floating point, despite your earlier denial. :-) Libopenjpeg has a bad reputation for messing with the floating- point state. Please make sure the library is not linked with -ffast-math. Beyond that, a few printf()s and "diff" should find the problem. Andrew. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-09-08 13:23 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-09-08 9:15 Susprising behavior of gcc on x86 (-m32) Mathieu Malaterre 2015-09-08 9:45 ` Andrew Haley 2015-09-08 9:57 ` Mathieu Malaterre 2015-09-08 10:05 ` Jonathan Wakely 2015-09-08 10:15 ` Mathieu Malaterre 2015-09-08 10:21 ` Mathieu Malaterre 2015-09-08 10:36 ` Jonathan Wakely 2015-09-08 11:38 ` Mathieu Malaterre 2015-09-08 12:00 ` Mathieu Malaterre 2015-09-08 12:40 ` Mathieu Malaterre 2015-09-08 13:16 ` Markus Trippelsdorf 2015-09-08 13:19 ` Mathieu Malaterre 2015-09-08 13:17 ` Mathieu Malaterre 2015-09-08 13:23 ` Andrew Haley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).