public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/109326] New: Bad assembler code generation for valid C on 886-64
@ 2023-03-29  1:04 susurrus.of.qualia at gmail dot com
  2023-03-29  1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-29  1:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

            Bug ID: 109326
           Summary: Bad assembler code generation for valid C on 886-64
           Product: gcc
           Version: og10 (devel/omp/gcc-10)
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: susurrus.of.qualia at gmail dot com
  Target Milestone: ---

Created attachment 54782
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54782&action=edit
compiler output

I have a bit of code here that is compiling without warnings and producing what
appear to be gross errors in the assembler output for some functions. 
Pertinent info:

$ gcc10.4 -v
Using built-in specs.
COLLECT_GCC=gcc10.4
COLLECT_LTO_WRAPPER=/home/stevet/libexec/gcc/x86_64-pc-linux-gnu/10.4.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-10.4.0/configure --prefix=/home/stevet
--program-suffix=10.4 --enable-shared --enable-linker-build-id
--without-included-gettext --enable-threads=posix --enable-nls
--enable-bootstrap --enable-clocale=gnu --with-tune=generic
--enable-languages=c --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.4.0 (GCC) 

uname -a
Linux mx 5.18.0-4mx-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1~mx21+1
(2022-08-22) x86_64 GNU/Linux



Unit compilation command:
gcc10.4 -c  -D_POSIX_C_SOURCE=200112L -DOLOCK_192 -DARCH_64 -DLINUX -I./ -I./
-pthread -m64 -std=c99 -Wall -Wextra -Wno-implicit-fallthrough -Werror
-falign-functions=16 -falign-loops=1 -falign-jumps=1
-fno-inline-small-functions -fdiagnostics-color=never -fverbose-asm 
--save-temps  -O3 -ggdb  -o olock.o olock.c


it should be noted that the bad code generation seems lessened, but not
eliminated at -O2.  Similarly, the problems were slightly different between
gcc-10.2.1 and the most recent 10.x release.


First thing to note is the assembler generated for the relatively simple
olock_reset_op() function.  Near as I can tell, the asm bears exactly zero
relation to the C code of that function.  The mystery constant $0xa06 seems
notable and also appears in init_olock_op_element_struct().

init_olock_op_struct() begins with an access to %fs:0x0, which is then
clobbered by an add $0x0, %rax shortly thereafter.  Perhaps this is normal.

olock_fsm_event() doesn't look good either.  There are three callq *%reg
instances where there should be at most one.

I'm not sure about olock_op_allocator().  olock_opcode_acqs() looks suspicious,
but I'm not that well versed in x86 so I could be wrong.

If I knew that the dynamic linker would fixup the %fs:0x0 references to
something normal I'd have more confidence about the rest of the code, but it
looks like about half the functions aren't correct at this point.

I've not yet tested any of this code yet; it is still subject to revision while
I clean it up.  With this type of algorithm it is unfortunately necessary to
have mostly correct code before even thinking about testing it.  This version
is close to that point.

As I note I can only attach one file, I'll include the assembler output for the
troublesome olock_reset_op() function for reference.

    216 0000000000000290 <olock_reset_op>:
    217      290:       0f b7 57 10             movzwl 0x10(%rdi),%edx
    218      294:       66 85 d2                test   %dx,%dx
    219      297:       0f 84 f4 04 00 00       je     791
<olock_reset_op+0x501>
    220      29d:       8d 42 ff                lea    -0x1(%rdx),%eax
    221      2a0:       66 83 f8 0e             cmp    $0xe,%ax
    222      2a4:       0f 86 e8 04 00 00       jbe    792
<olock_reset_op+0x502>
    223      2aa:       89 d1                   mov    %edx,%ecx
    224      2ac:       48 8d 47 2c             lea    0x2c(%rdi),%rax
    225      2b0:       66 c1 e9 04             shr    $0x4,%cx
    226      2b4:       83 e9 01                sub    $0x1,%ecx
    227      2b7:       0f b7 c9                movzwl %cx,%ecx
    228      2ba:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    229      2be:       48 c1 e1 07             shl    $0x7,%rcx
    230      2c2:       48 8d 8c 0f ac 01 00    lea    0x1ac(%rdi,%rcx,1),%rcx
    231      2c9:       00 
    232      2ca:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    233      2d0:       c7 40 f4 00 00 00 00    movl   $0x0,-0xc(%rax)
    234      2d7:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    235      2dd:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    236      2e3:       c7 40 0c 00 00 00 00    movl   $0x0,0xc(%rax)
    237      2ea:       be 06 0a 00 00          mov    $0xa06,%esi
    238      2ef:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    239      2f5:       48 05 80 01 00 00       add    $0x180,%rax
    240      2fb:       c7 80 a4 fe ff ff 00    movl   $0x0,-0x15c(%rax)
    241      302:       00 00 00 
    242      305:       c7 80 bc fe ff ff 00    movl   $0x0,-0x144(%rax)
    243      30c:       00 00 00 
    244      30f:       c7 80 d4 fe ff ff 00    movl   $0x0,-0x12c(%rax)
    245      316:       00 00 00 
    246      319:       c7 80 ec fe ff ff 00    movl   $0x0,-0x114(%rax)
    247      320:       00 00 00 
    248      323:       c7 80 04 ff ff ff 00    movl   $0x0,-0xfc(%rax)
    249      32a:       00 00 00 
    250      32d:       c7 80 1c ff ff ff 00    movl   $0x0,-0xe4(%rax)
    251      334:       00 00 00 
    252      337:       c7 80 34 ff ff ff 00    movl   $0x0,-0xcc(%rax)
    253      33e:       00 00 00 
    254      341:       c7 80 4c ff ff ff 00    movl   $0x0,-0xb4(%rax)
    255      348:       00 00 00 
    256      34b:       c7 80 64 ff ff ff 00    movl   $0x0,-0x9c(%rax)
    257      352:       00 00 00 
    258      355:       c7 80 7c ff ff ff 00    movl   $0x0,-0x84(%rax)
    259      35c:       00 00 00 
    260      35f:       c7 40 94 00 00 00 00    movl   $0x0,-0x6c(%rax)
    261      366:       c7 40 ac 00 00 00 00    movl   $0x0,-0x54(%rax)
    262      36d:       c7 40 c4 00 00 00 00    movl   $0x0,-0x3c(%rax)
    263      374:       c7 40 dc 00 00 00 00    movl   $0x0,-0x24(%rax)
    264      37b:       c6 80 7c fe ff ff 00    movb   $0x0,-0x184(%rax)
    265      382:       c6 80 94 fe ff ff 00    movb   $0x0,-0x16c(%rax)
    266      389:       c6 80 ac fe ff ff 00    movb   $0x0,-0x154(%rax)
    267      390:       c6 80 c4 fe ff ff 00    movb   $0x0,-0x13c(%rax)
    268      397:       c6 80 dc fe ff ff 00    movb   $0x0,-0x124(%rax)
    269      39e:       c6 80 f4 fe ff ff 00    movb   $0x0,-0x10c(%rax)
    270      3a5:       c6 80 0c ff ff ff 00    movb   $0x0,-0xf4(%rax)
    271      3ac:       c6 80 24 ff ff ff 00    movb   $0x0,-0xdc(%rax)
    272      3b3:       c6 80 3c ff ff ff 00    movb   $0x0,-0xc4(%rax)
    273      3ba:       c6 80 54 ff ff ff 00    movb   $0x0,-0xac(%rax)
    274      3c1:       c6 80 6c ff ff ff 00    movb   $0x0,-0x94(%rax)
    275      3c8:       c6 40 84 00             movb   $0x0,-0x7c(%rax)
    276      3cc:       c6 40 9c 00             movb   $0x0,-0x64(%rax)
    277      3d0:       c6 40 b4 00             movb   $0x0,-0x4c(%rax)
    278      3d4:       c6 40 cc 00             movb   $0x0,-0x34(%rax)
    279      3d8:       c6 40 e4 00             movb   $0x0,-0x1c(%rax)
    280      3dc:       66 44 89 88 80 fe ff    mov    %r9w,-0x180(%rax)
    281      3e3:       ff 
    282      3e4:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    283      3ea:       66 44 89 90 98 fe ff    mov    %r10w,-0x168(%rax)
    284      3f1:       ff 
    285      3f2:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    286      3f8:       66 44 89 98 b0 fe ff    mov    %r11w,-0x150(%rax)
    287      3ff:       ff 
    288      400:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    289      406:       66 89 b0 c8 fe ff ff    mov    %si,-0x138(%rax)
    290      40d:       be 06 0a 00 00          mov    $0xa06,%esi
    291      412:       66 44 89 80 e0 fe ff    mov    %r8w,-0x120(%rax)
    292      419:       ff 
    293      41a:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    294      420:       66 44 89 88 f8 fe ff    mov    %r9w,-0x108(%rax)
    295      427:       ff 
    296      428:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    297      42e:       66 44 89 90 10 ff ff    mov    %r10w,-0xf0(%rax)
    298      435:       ff 
    299      436:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    300      43c:       66 44 89 98 28 ff ff    mov    %r11w,-0xd8(%rax)
    301      443:       ff 
    302      444:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    303      44a:       66 89 b0 40 ff ff ff    mov    %si,-0xc0(%rax)
    304      451:       be 06 0a 00 00          mov    $0xa06,%esi
    305      456:       66 44 89 80 58 ff ff    mov    %r8w,-0xa8(%rax)
    306      45d:       ff 
    307      45e:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    308      464:       66 44 89 88 70 ff ff    mov    %r9w,-0x90(%rax)
    309      46b:       ff 
    310      46c:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    311      472:       66 44 89 50 88          mov    %r10w,-0x78(%rax)
    312      477:       66 44 89 58 a0          mov    %r11w,-0x60(%rax)
    313      47c:       66 89 70 b8             mov    %si,-0x48(%rax)
    314      480:       66 44 89 40 d0          mov    %r8w,-0x30(%rax)
    315      485:       66 44 89 48 e8          mov    %r9w,-0x18(%rax)
    316      48a:       48 39 c8                cmp    %rcx,%rax
    317      48d:       0f 85 37 fe ff ff       jne    2ca
<olock_reset_op+0x3a>
    318      493:       89 d0                   mov    %edx,%eax
    319      495:       83 e0 f0                and    $0xfffffff0,%eax
    320      498:       f6 c2 0f                test   $0xf,%dl
    321      49b:       0f 84 f8 02 00 00       je     799
<olock_reset_op+0x509>
    322      4a1:       0f b7 f0                movzwl %ax,%esi
    323      4a4:       8d 48 01                lea    0x1(%rax),%ecx
    324      4a7:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    325      4ab:       48 c1 e6 03             shl    $0x3,%rsi
    326      4af:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    327      4b3:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    328      4ba:       00 
    329      4bb:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    330      4c0:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    331      4c6:       66 44 89 44 37 2c       mov    %r8w,0x2c(%rdi,%rsi,1)
    332      4cc:       66 39 d1                cmp    %dx,%cx
    333      4cf:       0f 83 bc 02 00 00       jae    791
<olock_reset_op+0x501>
    334      4d5:       0f b7 c9                movzwl %cx,%ecx
    335      4d8:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    336      4de:       8d 70 02                lea    0x2(%rax),%esi
    337      4e1:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    338      4e5:       48 c1 e1 03             shl    $0x3,%rcx
    339      4e9:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    340      4ed:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    341      4f4:       00 
    342      4f5:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    343      4fa:       66 44 89 5c 0f 2c       mov    %r11w,0x2c(%rdi,%rcx,1)
    344      500:       66 39 d6                cmp    %dx,%si
    345      503:       0f 83 88 02 00 00       jae    791
<olock_reset_op+0x501>
    346      509:       0f b7 f6                movzwl %si,%esi
    347      50c:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    348      512:       8d 48 03                lea    0x3(%rax),%ecx
    349      515:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    350      519:       48 c1 e6 03             shl    $0x3,%rsi
    351      51d:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    352      521:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    353      528:       00 
    354      529:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    355      52e:       66 44 89 54 37 2c       mov    %r10w,0x2c(%rdi,%rsi,1)
    356      534:       66 39 ca                cmp    %cx,%dx
    357      537:       0f 86 54 02 00 00       jbe    791
<olock_reset_op+0x501>
    358      53d:       0f b7 c9                movzwl %cx,%ecx
    359      540:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    360      546:       8d 70 04                lea    0x4(%rax),%esi
    361      549:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    362      54d:       48 c1 e1 03             shl    $0x3,%rcx
    363      551:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    364      555:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    365      55c:       00 
    366      55d:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    367      562:       66 44 89 4c 0f 2c       mov    %r9w,0x2c(%rdi,%rcx,1)
    368      568:       66 39 f2                cmp    %si,%dx
    369      56b:       0f 86 20 02 00 00       jbe    791
<olock_reset_op+0x501>
    370      571:       0f b7 f6                movzwl %si,%esi
    371      574:       8d 48 05                lea    0x5(%rax),%ecx
    372      577:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    373      57b:       48 c1 e6 03             shl    $0x3,%rsi
    374      57f:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    375      583:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    376      58a:       00 
    377      58b:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    378      590:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    379      596:       66 44 89 44 37 2c       mov    %r8w,0x2c(%rdi,%rsi,1)
    380      59c:       66 39 ca                cmp    %cx,%dx
    381      59f:       0f 86 ec 01 00 00       jbe    791
<olock_reset_op+0x501>
    382      5a5:       0f b7 c9                movzwl %cx,%ecx
    383      5a8:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    384      5ae:       8d 70 06                lea    0x6(%rax),%esi
    385      5b1:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    386      5b5:       48 c1 e1 03             shl    $0x3,%rcx
    387      5b9:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    388      5bd:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    389      5c4:       00 
    390      5c5:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    391      5ca:       66 44 89 5c 0f 2c       mov    %r11w,0x2c(%rdi,%rcx,1)
    392      5d0:       66 39 f2                cmp    %si,%dx
    393      5d3:       0f 86 b8 01 00 00       jbe    791
<olock_reset_op+0x501>
    394      5d9:       0f b7 f6                movzwl %si,%esi
    395      5dc:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    396      5e2:       8d 48 07                lea    0x7(%rax),%ecx
    397      5e5:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    398      5e9:       48 c1 e6 03             shl    $0x3,%rsi
    399      5ed:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    400      5f1:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    401      5f8:       00 
    402      5f9:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    403      5fe:       66 44 89 54 37 2c       mov    %r10w,0x2c(%rdi,%rsi,1)
    404      604:       66 39 ca                cmp    %cx,%dx
    405      607:       0f 86 84 01 00 00       jbe    791
<olock_reset_op+0x501>
    406      60d:       0f b7 c9                movzwl %cx,%ecx
    407      610:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    408      616:       8d 70 08                lea    0x8(%rax),%esi
    409      619:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    410      61d:       48 c1 e1 03             shl    $0x3,%rcx
    411      621:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    412      625:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    413      62c:       00 
    414      62d:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    415      632:       66 44 89 4c 0f 2c       mov    %r9w,0x2c(%rdi,%rcx,1)
    416      638:       66 39 f2                cmp    %si,%dx
    417      63b:       0f 86 50 01 00 00       jbe    791
<olock_reset_op+0x501>
    418      641:       0f b7 f6                movzwl %si,%esi
    419      644:       8d 48 09                lea    0x9(%rax),%ecx
    420      647:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    421      64b:       48 c1 e6 03             shl    $0x3,%rsi
    422      64f:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    423      653:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    424      65a:       00 
    425      65b:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    426      660:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    427      666:       66 44 89 44 37 2c       mov    %r8w,0x2c(%rdi,%rsi,1)
    428      66c:       66 39 ca                cmp    %cx,%dx
    429      66f:       0f 86 1c 01 00 00       jbe    791
<olock_reset_op+0x501>
    430      675:       0f b7 c9                movzwl %cx,%ecx
    431      678:       41 bb 06 0a 00 00       mov    $0xa06,%r11d
    432      67e:       8d 70 0a                lea    0xa(%rax),%esi
    433      681:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    434      685:       48 c1 e1 03             shl    $0x3,%rcx
    435      689:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    436      68d:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    437      694:       00 
    438      695:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    439      69a:       66 44 89 5c 0f 2c       mov    %r11w,0x2c(%rdi,%rcx,1)
    440      6a0:       66 39 f2                cmp    %si,%dx
    441      6a3:       0f 86 e8 00 00 00       jbe    791
<olock_reset_op+0x501>
    442      6a9:       0f b7 f6                movzwl %si,%esi
    443      6ac:       41 ba 06 0a 00 00       mov    $0xa06,%r10d
    444      6b2:       8d 48 0b                lea    0xb(%rax),%ecx
    445      6b5:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    446      6b9:       48 c1 e6 03             shl    $0x3,%rsi
    447      6bd:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    448      6c1:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    449      6c8:       00 
    450      6c9:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    451      6ce:       66 44 89 54 37 2c       mov    %r10w,0x2c(%rdi,%rsi,1)
    452      6d4:       66 39 ca                cmp    %cx,%dx
    453      6d7:       0f 86 b4 00 00 00       jbe    791
<olock_reset_op+0x501>
    454      6dd:       0f b7 c9                movzwl %cx,%ecx
    455      6e0:       41 b9 06 0a 00 00       mov    $0xa06,%r9d
    456      6e6:       8d 70 0c                lea    0xc(%rax),%esi
    457      6e9:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    458      6ed:       48 c1 e1 03             shl    $0x3,%rcx
    459      6f1:       4c 8d 04 0f             lea    (%rdi,%rcx,1),%r8
    460      6f5:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    461      6fc:       00 
    462      6fd:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    463      702:       66 44 89 4c 0f 2c       mov    %r9w,0x2c(%rdi,%rcx,1)
    464      708:       66 39 f2                cmp    %si,%dx
    465      70b:       0f 86 80 00 00 00       jbe    791
<olock_reset_op+0x501>
    466      711:       0f b7 f6                movzwl %si,%esi
    467      714:       8d 48 0d                lea    0xd(%rax),%ecx
    468      717:       48 8d 34 76             lea    (%rsi,%rsi,2),%rsi
    469      71b:       48 c1 e6 03             shl    $0x3,%rsi
    470      71f:       4c 8d 04 37             lea    (%rdi,%rsi,1),%r8
    471      723:       41 c7 40 20 00 00 00    movl   $0x0,0x20(%r8)
    472      72a:       00 
    473      72b:       41 c6 40 28 00          movb   $0x0,0x28(%r8)
    474      730:       41 b8 06 0a 00 00       mov    $0xa06,%r8d
    475      736:       66 44 89 44 37 2c       mov    %r8w,0x2c(%rdi,%rsi,1)
    476      73c:       66 39 ca                cmp    %cx,%dx
    477      73f:       76 50                   jbe    791
<olock_reset_op+0x501>
    478      741:       0f b7 c9                movzwl %cx,%ecx
    479      744:       83 c0 0e                add    $0xe,%eax
    480      747:       48 8d 0c 49             lea    (%rcx,%rcx,2),%rcx
    481      74b:       48 c1 e1 03             shl    $0x3,%rcx
    482      74f:       48 8d 34 0f             lea    (%rdi,%rcx,1),%rsi
    483      753:       c7 46 20 00 00 00 00    movl   $0x0,0x20(%rsi)
    484      75a:       c6 46 28 00             movb   $0x0,0x28(%rsi)
    485      75e:       be 06 0a 00 00          mov    $0xa06,%esi
    486      763:       66 89 74 0f 2c          mov    %si,0x2c(%rdi,%rcx,1)
    487      768:       66 39 c2                cmp    %ax,%dx
    488      76b:       76 24                   jbe    791
<olock_reset_op+0x501>
    489      76d:       0f b7 c0                movzwl %ax,%eax
    490      770:       48 8d 04 40             lea    (%rax,%rax,2),%rax
    491      774:       48 c1 e0 03             shl    $0x3,%rax
    492      778:       48 8d 14 07             lea    (%rdi,%rax,1),%rdx
    493      77c:       c7 42 20 00 00 00 00    movl   $0x0,0x20(%rdx)
    494      783:       c6 42 28 00             movb   $0x0,0x28(%rdx)
    495      787:       ba 06 0a 00 00          mov    $0xa06,%edx
    496      78c:       66 89 54 07 2c          mov    %dx,0x2c(%rdi,%rax,1)
    497      791:       c3                      retq   
    498      792:       31 c0                   xor    %eax,%eax
    499      794:       e9 08 fd ff ff          jmpq   4a1
<olock_reset_op+0x211>
    500      799:       c3                      retq   
    501      79a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)


There seems to be some structure in the above, but in comparison to the source
it doesn't seem the slightest bit relevant.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
@ 2023-03-29  1:23 ` pinskia at gcc dot gnu.org
  2023-03-29  1:38 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29  1:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2023-03-29

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
init_olock_op_element_struct asm output looks fine to me:

        movzwl  .LC0(%rip), %eax
        movq    $0, (%rdi)
        movq    $0, 8(%rdi)
        movl    $0, 16(%rdi)
        movw    %ax, 20(%rdi)

LC0 is:
.LC0:
        .byte   6
        .byte   10

olock_fsm_event is fine too as it is just duplicating those basic blocks (the
calls).

init_olock_op_struct looks fine really:
        movq    %fs:0, %rax
        pxor    %xmm0, %xmm0
        movups  %xmm0, (%rdi)
        addq    $olock_tparams@tpoff, %rax

In Intel asm syntax:

        mov     rax, QWORD PTR fs:0
        pxor    xmm0, xmm0
        movups  XMMWORD PTR [rdi], xmm0
        add     rax, OFFSET FLAT:olock_tparams@tpoff

it is basically moving the TLS pointer to rax and then adding the offset for
the variable.

I don't understand what exactly you are complaining about really.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
  2023-03-29  1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
@ 2023-03-29  1:38 ` pinskia at gcc dot gnu.org
  2023-03-29  2:26 ` susurrus.of.qualia at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29  1:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note if you are disassemblying the object file with objdump -d, you might want
to add the -r option to enable dumping of the relocations that are produced
too. In the init_olock_op_struct case you miss the relocation of the object
file because of that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
  2023-03-29  1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
  2023-03-29  1:38 ` pinskia at gcc dot gnu.org
@ 2023-03-29  2:26 ` susurrus.of.qualia at gmail dot com
  2023-03-29  2:35 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-29  2:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

--- Comment #3 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> init_olock_op_element_struct asm output looks fine to me:
> 
>         movzwl  .LC0(%rip), %eax
>         movq    $0, (%rdi)
>         movq    $0, 8(%rdi)
>         movl    $0, 16(%rdi)
>         movw    %ax, 20(%rdi)
> 
> LC0 is:
> .LC0:
>         .byte   6
>         .byte   10
> 
> olock_fsm_event is fine too as it is just duplicating those basic blocks
> (the calls).
> 
> init_olock_op_struct looks fine really:
>         movq    %fs:0, %rax
>         pxor    %xmm0, %xmm0
>         movups  %xmm0, (%rdi)
>         addq    $olock_tparams@tpoff, %rax
> 
> In Intel asm syntax:
> 
>         mov     rax, QWORD PTR fs:0
>         pxor    xmm0, xmm0
>         movups  XMMWORD PTR [rdi], xmm0
>         add     rax, OFFSET FLAT:olock_tparams@tpoff
> 
> it is basically moving the TLS pointer to rax and then adding the offset for
> the variable.
> 
> I don't understand what exactly you are complaining about realy.

OK, I wasn't sure about the TLS accesses; adding -r to objdump helped clear
that up.  However I don't understand why olock_reset_op() is so large.  It's a
trivial initializer for a descriptor with an array of olock_op_element
structures appended.  There's no way it should look like what I quoted.  I'd be
happy if I am experiencing a fever-dream over nothing due to ignorance, but I
am not convinced that that is the case.  If I am wrong I will be very
disappointed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Bad assembler code generation for valid C on 886-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
                   ` (2 preceding siblings ...)
  2023-03-29  2:26 ` susurrus.of.qualia at gmail dot com
@ 2023-03-29  2:35 ` pinskia at gcc dot gnu.org
  2023-03-30  2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
  2023-03-30  2:32 ` susurrus.of.qualia at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-29  2:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Steve Thompson from comment #3)
> However I don't understand why olock_reset_op() is so large.  It's
> a trivial initializer for a descriptor with an array of olock_op_element
> structures appended.  There's no way it should look like what I quoted.  I'd
> be happy if I am experiencing a fever-dream over nothing due to ignorance,
> but I am not convinced that that is the case.  If I am wrong I will be very
> disappointed.

GCC unrolled the loop via vectorizing it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
                   ` (3 preceding siblings ...)
  2023-03-29  2:35 ` pinskia at gcc dot gnu.org
@ 2023-03-30  2:18 ` susurrus.of.qualia at gmail dot com
  2023-03-30  2:32 ` susurrus.of.qualia at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-30  2:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

--- Comment #5 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Steve Thompson from comment #3)
> > However I don't understand why olock_reset_op() is so large.  It's
> > a trivial initializer for a descriptor with an array of olock_op_element
> > structures appended.  There's no way it should look like what I quoted.  I'd
> > be happy if I am experiencing a fever-dream over nothing due to ignorance,
> > but I am not convinced that that is the case.  If I am wrong I will be very
> > disappointed.
> 
> GCC unrolled the loop via vectorizing it.

OMG did it ever.  It seems that I'm an idiot and must apologise for wasting
everyone's time.

I fixed up some remaining support code and dug into it with gdb and determined
that it does, in fact work.   There appear to be distinct paths for particular
array ranges and logic to take care odd numbers, sort of like memcopy handling
large blocks.  

But I have to say that i really don't like it, and obviously I can work around
it by making the while() block similar to what is done in olock_init_op(). 
That gives me two functions with a combined text of 64 bytes if there is no
padding.  Compare this to the 1.2KB  of the original disassembly for a generous
factor of 20 code expansion.  That seems like a great way to bloat code.

I realize that -Os is available, but it eliminates a bunch of supposed inline
functions leading to linker errors for the missing symbols.  I'm not about to
try finding out why for the time being as I don't really need it.

For fun I built a short test program and measured the latency across
olock_reset_op for various array lengths:

          1    8   16   32
64B code:

1.2K code:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64
  2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
                   ` (4 preceding siblings ...)
  2023-03-30  2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
@ 2023-03-30  2:32 ` susurrus.of.qualia at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: susurrus.of.qualia at gmail dot com @ 2023-03-30  2:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109326

--- Comment #6 from Steve Thompson <susurrus.of.qualia at gmail dot com> ---
(In reply to Steve Thompson from comment #5)
>           1    8   16   32
> 64B code:
> 
> 1.2K code:

Sorry, my touchpad glitched and sent prematurely.

For the overlarge vectorized version I hate:
[28]  nr_ops=1      nr_samples=1000000(0)   min=1       avg=5       max=12248
[28]  nr_ops=8      nr_samples=1000000(0)   min=1       avg=6       max=13022
[28]  nr_ops=16     nr_samples=1000000(0)   min=8       avg=11      max=9548 
[28]  nr_ops=32     nr_samples=1000000(0)   min=26      avg=33      max=8126 
[28]  nr_ops=64     nr_samples=1000000(0)   min=62      avg=73      max=11186
[28]  nr_ops=128    nr_samples=1000000(0)   min=134     avg=153     max=14426
[28]  nr_ops=256    nr_samples=1000000(0)   min=296     avg=312     max=12608
[28]  nr_ops=1024   nr_samples=1000000(0)   min=1250    avg=1269    max=23858

And the compact, esthetically pleasing version I like:
[28]  nr_ops=1      nr_samples=1000000(0)   min=1       avg=5       max=7910 
[28]  nr_ops=8      nr_samples=1000000(0)   min=1       avg=7       max=20150
[28]  nr_ops=16     nr_samples=1000000(0)   min=8       avg=24      max=11402
[28]  nr_ops=32     nr_samples=1000000(0)   min=62      avg=74      max=20582
[28]  nr_ops=64     nr_samples=1000000(0)   min=152     avg=153     max=12482
[28]  nr_ops=128    nr_samples=1000000(0)   min=296     avg=313     max=33884
[28]  nr_ops=256    nr_samples=1000000(0)   min=620     avg=632     max=22940
[28]  nr_ops=1024   nr_samples=1000000(0)   min=2528    avg=2546    max=25064

(System is an AMD Ryzen 5700U laptop; the [28] is the measured cycle latency of
the RDTSCP operation; ()'ed number shows bad samples occasionally).  


As it turns out, there are no advantages to the vectorized version until arrays
of 16; after that it is approximately twice as fast.  Some will be happy to pay
that cost for the extra performance I suppose, but it still seems wasteful.

Again, sorry for being an idiot.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-30  2:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29  1:04 [Bug c/109326] New: Bad assembler code generation for valid C on 886-64 susurrus.of.qualia at gmail dot com
2023-03-29  1:23 ` [Bug middle-end/109326] " pinskia at gcc dot gnu.org
2023-03-29  1:38 ` pinskia at gcc dot gnu.org
2023-03-29  2:26 ` susurrus.of.qualia at gmail dot com
2023-03-29  2:35 ` pinskia at gcc dot gnu.org
2023-03-30  2:18 ` [Bug middle-end/109326] Sub-optimal assembler code generation for valid C on x86-64 susurrus.of.qualia at gmail dot com
2023-03-30  2:32 ` susurrus.of.qualia at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).