public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal
@ 2021-04-28 15:09 vda.linux at googlemail dot com
2021-04-28 15:39 ` [Bug target/100320] [8/9/10/11/12 Regression] " jakub at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: vda.linux at googlemail dot com @ 2021-04-28 15:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Bug ID: 100320
Summary: regression: 32-bit x86 memcpy is suboptimal
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: vda.linux at googlemail dot com
Target Milestone: ---
Bug 21329 has returned.
32-bit x86 memory block moves are using "movl $LEN,%ecx; rep movsl" insns.
However, for fixed short blocks it is more efficient to just repeat a few
"movsl" insns - this allows to drop "mov $LEN,%ecx" insn.
It's shorter, and more importantly, "rep movsl" are slow-start microcoded insns
(they are faster than moves using general-purpose registers only on blocks
larger than 100-200 bytes) - OTOH, bare "movsl" are not microcoded and take ~4
cycles to execute.
21329 was closed with it fixed:
CVSROOT: /cvs/gcc
Module name: gcc
Branch: gcc-4_0-rhl-branch
Changes by: jakub@gcc.gnu.org 2005-05-18 19:08:44
Modified files:
gcc : ChangeLog
gcc/config/i386: i386.c
Log message:
2005-05-06 Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua>
Jakub Jelinek <jakub@redhat.com>
PR target/21329
* config/i386/i386.c (ix86_expand_movmem): Don't use rep; movsb
for -Os if (movsl;)*(movsw;)?(movsb;)? sequence is shorter.
Don't use rep; movs{l,q} if the repetition count is really small,
instead use a sequence of movs{l,q} instructions.
(the above is commit 95935e2db5c45bef5631f51538d1e10d8b5b7524 in
gcc.gnu.org/git/gcc.git,
seems that code was largely replaced by:
commit 8c996513856f2769aee1730cb211050fef055fb5
Author: Jan Hubicka <jh@suse.cz>
Date: Mon Nov 27 17:00:26 2006 +010
expr.c (emit_block_move_via_libcall): Export.
)
With gcc version 11.0.0 20210210 (Red Hat 11.0.0-0) (GCC) I see "rep movsl"s
again:
void *f(void *d, const void *s)
{ return memcpy(d, s, 16); }
$ gcc -Os -m32 -fomit-frame-pointer -c -o z.o z.c && objdump -drw z.o
z.o: file format elf32-i386
Disassembly of section .text:
00000000 <f>:
0: 57 push %edi
1: b9 04 00 00 00 mov $0x4,%ecx
6: 56 push %esi
7: 8b 44 24 0c mov 0xc(%esp),%eax
b: 8b 74 24 10 mov 0x10(%esp),%esi
f: 89 c7 mov %eax,%edi
11: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
13: 5e pop %esi
14: 5f pop %edi
15: c3 ret
The expected code would not have "mov $0x4,%ecx" and would have "rep movsl"
replaced by "movsl;movsl;movsl;movsl".
The testcase from 21329 with implicit block moves via struct copies, from here
https://gcc.gnu.org/bugzilla/attachment.cgi?id=8790
also demonstrates it:
$ gcc -Os -m32 -fomit-frame-pointer -c -o z1.o z1.c && objdump -drw z1.o
z1.o: file format elf32-i386
Disassembly of section .text:
00000000 <f10>:
0: a1 00 00 00 00 mov 0x0,%eax 1: R_386_32 w10
5: a3 00 00 00 00 mov %eax,0x0 6: R_386_32 t10
a: c3 ret
0000000b <f20>:
b: a1 00 00 00 00 mov 0x0,%eax c: R_386_32 w20
10: 8b 15 04 00 00 00 mov 0x4,%edx 12: R_386_32 w20
16: a3 00 00 00 00 mov %eax,0x0 17: R_386_32 t20
1b: 89 15 04 00 00 00 mov %edx,0x4 1d: R_386_32 t20
21: c3 ret
00000022 <f21>:
22: 57 push %edi
23: b9 09 00 00 00 mov $0x9,%ecx
28: bf 00 00 00 00 mov $0x0,%edi 29: R_386_32 t21
2d: 56 push %esi
2e: be 00 00 00 00 mov $0x0,%esi 2f: R_386_32 w21
33: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
35: 5e pop %esi
36: 5f pop %edi
37: c3 ret
00000038 <f22>:
38: 57 push %edi
39: b9 0a 00 00 00 mov $0xa,%ecx
3e: bf 00 00 00 00 mov $0x0,%edi 3f: R_386_32 t22
43: 56 push %esi
44: be 00 00 00 00 mov $0x0,%esi 45: R_386_32 w22
49: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
4b: 5e pop %esi
4c: 5f pop %edi
4d: c3 ret
0000004e <f23>:
4e: 57 push %edi
4f: b9 0b 00 00 00 mov $0xb,%ecx
54: bf 00 00 00 00 mov $0x0,%edi 55: R_386_32 t23
59: 56 push %esi
5a: be 00 00 00 00 mov $0x0,%esi 5b: R_386_32 w23
5f: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
61: 5e pop %esi
62: 5f pop %edi
63: c3 ret
00000064 <f30>:
64: 57 push %edi
65: b9 03 00 00 00 mov $0x3,%ecx
6a: bf 00 00 00 00 mov $0x0,%edi 6b: R_386_32 t30
6f: 56 push %esi
70: be 00 00 00 00 mov $0x0,%esi 71: R_386_32 w30
75: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
77: 5e pop %esi
78: 5f pop %edi
79: c3 ret
0000007a <f40>:
7a: 57 push %edi
7b: b9 04 00 00 00 mov $0x4,%ecx
80: bf 00 00 00 00 mov $0x0,%edi 81: R_386_32 t40
85: 56 push %esi
86: be 00 00 00 00 mov $0x0,%esi 87: R_386_32 w40
8b: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
8d: 5e pop %esi
8e: 5f pop %edi
8f: c3 ret
00000090 <f50>:
90: 57 push %edi
91: b9 05 00 00 00 mov $0x5,%ecx
96: bf 00 00 00 00 mov $0x0,%edi 97: R_386_32 t50
9b: 56 push %esi
9c: be 00 00 00 00 mov $0x0,%esi 9d: R_386_32 w50
a1: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
a3: 5e pop %esi
a4: 5f pop %edi
a5: c3 ret
000000a6 <f60>:
a6: 57 push %edi
a7: b9 06 00 00 00 mov $0x6,%ecx
ac: bf 00 00 00 00 mov $0x0,%edi ad: R_386_32 t60
b1: 56 push %esi
b2: be 00 00 00 00 mov $0x0,%esi b3: R_386_32 w60
b7: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
b9: 5e pop %esi
ba: 5f pop %edi
bb: c3 ret
000000bc <f>:
...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [8/9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
@ 2021-04-28 15:39 ` jakub at gcc dot gnu.org
2021-04-28 15:43 ` vda.linux at googlemail dot com
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-28 15:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
Target Milestone|--- |8.5
CC| |hubicka at gcc dot gnu.org,
| |jakub at gcc dot gnu.org
Last reconfirmed| |2021-04-28
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Summary|regression: 32-bit x86 |[8/9/10/11/12 Regression]
|memcpy is suboptimal |32-bit x86 memcpy is
| |suboptimal
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Indeed, at least with -minline-all-stringops -Os -m32 -fomit-frame-pointer
starting with r0-68071-g95935e2db5c45bef5631f51538d1e10d8b5b7524
it was a series of movsl insns and starting with most likely
r0-77675-g8c996513856f2769aee1730cb211050fef055fb5
(can't know for sure, as the compiler then ICEs for a couple of revisions on
it) it is back rep movsl.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [8/9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
2021-04-28 15:39 ` [Bug target/100320] [8/9/10/11/12 Regression] " jakub at gcc dot gnu.org
@ 2021-04-28 15:43 ` vda.linux at googlemail dot com
2021-04-29 7:10 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vda.linux at googlemail dot com @ 2021-04-28 15:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
--- Comment #2 from Denis Vlasenko <vda.linux at googlemail dot com> ---
The relevant code in current git seems to be:
static void
expand_set_or_cpymem_via_rep (rtx destmem, rtx srcmem,
rtx destptr, rtx srcptr, rtx value, rtx orig_value,
rtx count,
machine_mode mode, bool issetmem)
{
rtx destexp;
rtx srcexp;
rtx countreg;
HOST_WIDE_INT rounded_count;
/* If possible, it is shorter to use rep movs.
TODO: Maybe it is better to move this logic to decide_alg. */
if (mode == QImode && CONST_INT_P (count) && !(INTVAL (count) & 3)
&& !TARGET_PREFER_KNOWN_REP_MOVSB_STOSB
&& (!issetmem || orig_value == const0_rtx))
mode = SImode;
if (destptr != XEXP (destmem, 0) || GET_MODE (destmem) != BLKmode)
destmem = adjust_automodify_address_nv (destmem, BLKmode, destptr, 0);
countreg = ix86_zero_extend_to_Pmode (scale_counter (count,
GET_MODE_SIZE (mode)));
if (mode != QImode)
{
destexp = gen_rtx_ASHIFT (Pmode, countreg,
GEN_INT (exact_log2 (GET_MODE_SIZE (mode))));
destexp = gen_rtx_PLUS (Pmode, destexp, destptr);
}
else
destexp = gen_rtx_PLUS (Pmode, destptr, countreg);
if ((!issetmem || orig_value == const0_rtx) && CONST_INT_P (count))
{
rounded_count
= ROUND_DOWN (INTVAL (count), (HOST_WIDE_INT) GET_MODE_SIZE (mode));
destmem = shallow_copy_rtx (destmem);
set_mem_size (destmem, rounded_count);
}
else if (MEM_SIZE_KNOWN_P (destmem))
clear_mem_size (destmem);
if (issetmem)
{
value = force_reg (mode, gen_lowpart (mode, value));
emit_insn (gen_rep_stos (destptr, countreg, destmem, value, destexp));
}
else
{
if (srcptr != XEXP (srcmem, 0) || GET_MODE (srcmem) != BLKmode)
srcmem = adjust_automodify_address_nv (srcmem, BLKmode, srcptr, 0);
if (mode != QImode)
{
srcexp = gen_rtx_ASHIFT (Pmode, countreg,
GEN_INT (exact_log2 (GET_MODE_SIZE
(mode))));
srcexp = gen_rtx_PLUS (Pmode, srcexp, srcptr);
}
else
srcexp = gen_rtx_PLUS (Pmode, srcptr, countreg);
if (CONST_INT_P (count))
{
rounded_count
= ROUND_DOWN (INTVAL (count), (HOST_WIDE_INT) GET_MODE_SIZE
(mode));
srcmem = shallow_copy_rtx (srcmem);
set_mem_size (srcmem, rounded_count);
}
else
{
if (MEM_SIZE_KNOWN_P (srcmem))
clear_mem_size (srcmem);
}
emit_insn (gen_rep_mov (destptr, destmem, srcptr, srcmem, countreg,
destexp, srcexp));
}
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [8/9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
2021-04-28 15:39 ` [Bug target/100320] [8/9/10/11/12 Regression] " jakub at gcc dot gnu.org
2021-04-28 15:43 ` vda.linux at googlemail dot com
@ 2021-04-29 7:10 ` rguenth at gcc dot gnu.org
2021-05-14 9:54 ` [Bug target/100320] [9/10/11/12 " jakub at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-29 7:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
Keywords| |missed-optimization
Target| |i?86-*-*
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
` (2 preceding siblings ...)
2021-04-29 7:10 ` rguenth at gcc dot gnu.org
@ 2021-05-14 9:54 ` jakub at gcc dot gnu.org
2021-06-01 8:20 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14 9:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|8.5 |9.4
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
` (3 preceding siblings ...)
2021-05-14 9:54 ` [Bug target/100320] [9/10/11/12 " jakub at gcc dot gnu.org
@ 2021-06-01 8:20 ` rguenth at gcc dot gnu.org
2022-05-27 9:45 ` [Bug target/100320] [10/11/12/13 " rguenth at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01 8:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.4 |9.5
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [10/11/12/13 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
` (4 preceding siblings ...)
2021-06-01 8:20 ` rguenth at gcc dot gnu.org
@ 2022-05-27 9:45 ` rguenth at gcc dot gnu.org
2022-06-28 10:44 ` jakub at gcc dot gnu.org
2023-07-07 10:39 ` [Bug target/100320] [11/12/13/14 " rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27 9:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.5 |10.4
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [10/11/12/13 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
` (5 preceding siblings ...)
2022-05-27 9:45 ` [Bug target/100320] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:44 ` jakub at gcc dot gnu.org
2023-07-07 10:39 ` [Bug target/100320] [11/12/13/14 " rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.4 |10.5
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100320] [11/12/13/14 Regression] 32-bit x86 memcpy is suboptimal
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
` (6 preceding siblings ...)
2022-06-28 10:44 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:39 ` rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.5 |11.5
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-07-07 10:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 15:09 [Bug c/100320] New: regression: 32-bit x86 memcpy is suboptimal vda.linux at googlemail dot com
2021-04-28 15:39 ` [Bug target/100320] [8/9/10/11/12 Regression] " jakub at gcc dot gnu.org
2021-04-28 15:43 ` vda.linux at googlemail dot com
2021-04-29 7:10 ` rguenth at gcc dot gnu.org
2021-05-14 9:54 ` [Bug target/100320] [9/10/11/12 " jakub at gcc dot gnu.org
2021-06-01 8:20 ` rguenth at gcc dot gnu.org
2022-05-27 9:45 ` [Bug target/100320] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:44 ` jakub at gcc dot gnu.org
2023-07-07 10:39 ` [Bug target/100320] [11/12/13/14 " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).