public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/105429] New: Unnecessary moves generated by the compiler.
@ 2022-04-28 20:02 mareksz1958 at wp dot pl
2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: mareksz1958 at wp dot pl @ 2022-04-28 20:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429
Bug ID: 105429
Summary: Unnecessary moves generated by the compiler.
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: mareksz1958 at wp dot pl
Target Milestone: ---
The following C code:
>>>
#include <nmmintrin.h>
#include <stdint.h>
uint32_t crc(uint32_t current, const uint8_t *buffer, size_t size) {
for(size_t i = 0; i < size; i++)
current = _mm_crc32_u64(current, buffer[i]);
return current;
}
<<<
Generates inefficient assembly on all optimisation presets due to the extra
`mov eax, eax' - Os and O3 below:
>>>
crc:
movl %edi, %eax
xorl %ecx, %ecx
.L2:
cmpq %rdx, %rcx
je .L5
movzbl (%rsi,%rcx), %edi
movl %eax, %eax
incq %rcx
crc32q %rdi, %rax
jmp .L2
.L5:
ret
crc:
movl %edi, %eax
testq %rdx, %rdx
je .L6
leaq (%rsi,%rdx), %rcx
.L3:
movzbl (%rsi), %edx
movl %eax, %eax
addq $1, %rsi
crc32q %rdx, %rax
cmpq %rsi, %rcx
jne .L3
.L6:
ret
<<<
The problem seems to be present in all GCC versions I have access to. The
redundant move greatly worsens the performance of the generated code. When
`_mm_crc32_u64' is replaced by any other function, the problem seems to
disappear.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
@ 2022-04-28 20:04 ` pinskia at gcc dot gnu.org
2022-04-29 12:30 ` ubizjak at gmail dot com
2022-04-29 12:38 ` kspalaiologos at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-28 20:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
@ 2022-04-29 12:30 ` ubizjak at gmail dot com
2022-04-29 12:38 ` kspalaiologos at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2022-04-29 12:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
The intrinsic is defined as:
unsinged __int64 _mm_crc32_u64( unsinged __int64 crc, unsigned __int64 data )
and the unnecessary move is in fact zero-extend:
movl %eax, %eax # 16 [c=1 l=2] *zero_extendsidi2/3
You probably want:
uint32_t crc(uint64_t current, const uint8_t *buffer, size_t size) {
for(size_t i = 0; i < size; i++)
current = _mm_crc32_u64(current, buffer[i]);
return current;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64
2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
2022-04-29 12:30 ` ubizjak at gmail dot com
@ 2022-04-29 12:38 ` kspalaiologos at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: kspalaiologos at gmail dot com @ 2022-04-29 12:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429
--- Comment #2 from Palaiologos <kspalaiologos at gmail dot com> ---
I have observed the same behaviour with and without `mov eax, eax`. CRC32 is a
32-bit checksum, so I'd presume that the high bits aren't considered by the
instruction.
To support my claim, Vol. 2A 3-257 of Intel Software Development Manual gives
the following operation for 2 REX.W 0F 38 F1 /r:
>>>
TEMP1[63-0] := BIT_REFLECT64 (SRC[63-0])
TEMP2[31-0] := BIT_REFLECT32 (DEST[31-0])
TEMP3[95-0] := TEMP1[63-0] « 32
TEMP4[95-0] := TEMP2[31-0] « 64
TEMP5[95-0] := TEMP3[95-0] XOR TEMP4[95-0]
TEMP6[31-0] := TEMP5[95-0] MOD2 11EDC6F41H
DEST[31-0] := BIT_REFLECT (TEMP6[31-0])
DEST[63-32] := 00000000H
<<<
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-29 12:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 20:02 [Bug other/105429] New: Unnecessary moves generated by the compiler mareksz1958 at wp dot pl
2022-04-28 20:04 ` [Bug target/105429] Unnecessary moves generated with _mm_crc32_u64 pinskia at gcc dot gnu.org
2022-04-29 12:30 ` ubizjak at gmail dot com
2022-04-29 12:38 ` kspalaiologos at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).