public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/105429] New: Unnecessary moves generated by the compiler.
@ 2022-04-28 20:02 mareksz1958 at wp dot pl
  2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: mareksz1958 at wp dot pl @ 2022-04-28 20:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429

            Bug ID: 105429
           Summary: Unnecessary moves generated by the compiler.
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mareksz1958 at wp dot pl
  Target Milestone: ---

The following C code:

>>>
#include <nmmintrin.h>
#include <stdint.h>
uint32_t crc(uint32_t current, const uint8_t *buffer, size_t size) {
    for(size_t i = 0; i < size; i++)
        current = _mm_crc32_u64(current, buffer[i]);
    return current;
}
<<<

Generates inefficient assembly on all optimisation presets due to the extra
`mov eax, eax' - Os and O3 below:

>>>
crc:
        movl    %edi, %eax
        xorl    %ecx, %ecx
.L2:
        cmpq    %rdx, %rcx
        je      .L5
        movzbl  (%rsi,%rcx), %edi
        movl    %eax, %eax
        incq    %rcx
        crc32q  %rdi, %rax
        jmp     .L2
.L5:
        ret

crc:
        movl    %edi, %eax
        testq   %rdx, %rdx
        je      .L6
        leaq    (%rsi,%rdx), %rcx
.L3:
        movzbl  (%rsi), %edx
        movl    %eax, %eax
        addq    $1, %rsi
        crc32q  %rdx, %rax
        cmpq    %rsi, %rcx
        jne     .L3
.L6:
        ret
<<<

The problem seems to be present in all GCC versions I have access to. The
redundant move greatly worsens the performance of the generated code. When
`_mm_crc32_u64' is replaced by any other function, the problem seems to
disappear.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
  2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
@ 2022-04-28 20:04 ` pinskia at gcc dot gnu.org
  2022-04-29 12:30 ` ubizjak at gmail dot com
  2022-04-29 12:38 ` kspalaiologos at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-28 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
  2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
  2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
@ 2022-04-29 12:30 ` ubizjak at gmail dot com
  2022-04-29 12:38 ` kspalaiologos at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2022-04-29 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
The intrinsic is defined as:

unsinged __int64 _mm_crc32_u64( unsinged __int64 crc, unsigned __int64 data )

and the unnecessary move is in fact zero-extend:

        movl    %eax, %eax      # 16    [c=1 l=2]  *zero_extendsidi2/3

You probably want:

uint32_t crc(uint64_t current, const uint8_t *buffer, size_t size) {
    for(size_t i = 0; i < size; i++)
        current = _mm_crc32_u64(current, buffer[i]);
    return current;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
  2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
  2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
  2022-04-29 12:30 ` ubizjak at gmail dot com
@ 2022-04-29 12:38 ` kspalaiologos at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: kspalaiologos at gmail dot com @ 2022-04-29 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429

--- Comment #2 from Palaiologos <kspalaiologos at gmail dot com> ---
I have observed the same behaviour with and without `mov eax, eax`. CRC32 is a
32-bit checksum, so I'd presume that the high bits aren't considered by the
instruction.

To support my claim, Vol. 2A 3-257 of Intel Software Development Manual gives
the following operation for 2 REX.W 0F 38 F1 /r:

>>>
TEMP1[63-0] := BIT_REFLECT64 (SRC[63-0])
TEMP2[31-0] := BIT_REFLECT32 (DEST[31-0])
TEMP3[95-0] := TEMP1[63-0] « 32
TEMP4[95-0] := TEMP2[31-0] « 64
TEMP5[95-0] := TEMP3[95-0] XOR TEMP4[95-0]
TEMP6[31-0] := TEMP5[95-0] MOD2 11EDC6F41H
DEST[31-0] := BIT_REFLECT (TEMP6[31-0])
DEST[63-32] := 00000000H
<<<

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-29 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
2022-04-29 12:30 ` ubizjak at gmail dot com
2022-04-29 12:38 ` kspalaiologos at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).