public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/107836] New: x86_64 inline functions -O2/-O3 optimization error
@ 2022-11-23 15:04 czx211355007 at gmail dot com
  2022-11-23 15:11 ` [Bug c/107836] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: czx211355007 at gmail dot com @ 2022-11-23 15:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836

            Bug ID: 107836
           Summary: x86_64 inline functions -O2/-O3 optimization error
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: czx211355007 at gmail dot com
  Target Milestone: ---
            Target: x86_64-linux-gnu

Created attachment 53952
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53952&action=edit
full assembly for function "matrix_mul"

When compiling the following two functions with -O2 or -O3 options, the
assembly code generated is wrong.
int dot_product(short* a, short* b, int len){
    int result;
    asm("pandn %%mm5,%%mm5;"::);   
    for(int i=0; i < len; i += 4){
        asm(
            "movq %0,%%mm0;"
            "movq %1,%%mm1;"
            "pmaddwd %%mm1,%%mm0;"
            "paddd %%mm0,%%mm5;"          
            :  
            : "m" (a[i]), "m" (b[i])
        );
    }
    asm("movq %%mm5, %%mm0;"
        "psrlq $32,%%mm5;"
        "paddd %%mm0, %%mm5;"
        "movd %%mm5,%0;"
        "emms"
        :"=r" (result)
        :);
    return result;
}

}
void matrix_mul(int d, short a[d][d], short b[d][d], int c[d][d]){
    for(int i=0;i<d;i++){
        for(int j=0;j<d;j++){
            c[i][j] = dot_product(a[i], b[j], d);
        }   
    }
    return;
}

The part of the assembly code for "matrix_mul" where I see an error:
    14b5:       0f 6f c5                movq   %mm5,%mm0
    14b8:       0f 73 d5 20             psrlq  $0x20,%mm5
    14bc:       0f fe e8                paddd  %mm0,%mm5
    14bf:       0f 7e eb                movd   %mm5,%ebx
    14c2:       0f 77                   emms
    14c4:       0f 1f 40 00             nopl   0x0(%rax)
    14c8:       4b 8d 34 0e             lea    (%r14,%r9,1),%rsi
    14cc:       4b 8d 4c 05 00          lea    0x0(%r13,%r8,1),%rcx
    14d1:       31 ff                   xor    %edi,%edi
    14d3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    14d8:       0f df ed                pandn  %mm5,%mm5
    14db:       49 8d 14 3b             lea    (%r11,%rdi,1),%rdx
    14df:       4c 89 c0                mov    %r8,%rax
    14e2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    14e8:       0f 6f 00                movq   (%rax),%mm0
    14eb:       0f 6f 0a                movq   (%rdx),%mm1

Here mm0 and mm5 are used before values are assigned to mm0 and mm1, which
leads to a calculation error when using "matrix_mul" to do matrix
multiplication. 
In addition, when using a low optimization level to compile, there is no error
and it's able to get correct results.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/107836] x86_64 inline functions -O2/-O3 optimization error
  2022-11-23 15:04 [Bug c/107836] New: x86_64 inline functions -O2/-O3 optimization error czx211355007 at gmail dot com
@ 2022-11-23 15:11 ` pinskia at gcc dot gnu.org
  2022-11-23 15:48 ` schwab@linux-m68k.org
  2022-11-24  8:36 ` czx211355007 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-23 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |inline-asm

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Your inline-asm is missing some clubbers and I don't think you use inline-asm
this way where you keep around a value inside mm5 since the compiler does not
know you did that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/107836] x86_64 inline functions -O2/-O3 optimization error
  2022-11-23 15:04 [Bug c/107836] New: x86_64 inline functions -O2/-O3 optimization error czx211355007 at gmail dot com
  2022-11-23 15:11 ` [Bug c/107836] " pinskia at gcc dot gnu.org
@ 2022-11-23 15:48 ` schwab@linux-m68k.org
  2022-11-24  8:36 ` czx211355007 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: schwab@linux-m68k.org @ 2022-11-23 15:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
There is no dependency whatsoever between the asm statements, thus they can be
moved around freely. Especially the third one is producing a constant output as
seen by the compiler, thus moving it to the top of the function is perfectly
valid.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/107836] x86_64 inline functions -O2/-O3 optimization error
  2022-11-23 15:04 [Bug c/107836] New: x86_64 inline functions -O2/-O3 optimization error czx211355007 at gmail dot com
  2022-11-23 15:11 ` [Bug c/107836] " pinskia at gcc dot gnu.org
  2022-11-23 15:48 ` schwab@linux-m68k.org
@ 2022-11-24  8:36 ` czx211355007 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: czx211355007 at gmail dot com @ 2022-11-24  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836

--- Comment #3 from Zixuan Chen <czx211355007 at gmail dot com> ---
I think there is a data dependency between the second asm statement and the
third, a read-after-write one. If the third one is moved to the top then we
can't get the correct value of mm5(mm0). Also, could you explain why the
result using -O1 to compile is correct as expected where the asm statements
remain in the same order as they should be?

schwab@linux-m68k.org <gcc-bugzilla@gcc.gnu.org> 于2022年11月23日周三 23:48写道:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836
>
> Andreas Schwab <schwab@linux-m68k.org> changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>          Resolution|---                         |INVALID
>              Status|UNCONFIRMED                 |RESOLVED
>
> --- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
> There is no dependency whatsoever between the asm statements, thus they
> can be
> moved around freely. Especially the third one is producing a constant
> output as
> seen by the compiler, thus moving it to the top of the function is
> perfectly
> valid.
>
> --
> You are receiving this mail because:
> You reported the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-24  8:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-23 15:04 [Bug c/107836] New: x86_64 inline functions -O2/-O3 optimization error czx211355007 at gmail dot com
2022-11-23 15:11 ` [Bug c/107836] " pinskia at gcc dot gnu.org
2022-11-23 15:48 ` schwab@linux-m68k.org
2022-11-24  8:36 ` czx211355007 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).