public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/46599] New: Possible enhancement for inline stringops with -Os
@ 2010-11-22 10:35 gcc.hall at gmail dot com
2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
2010-11-22 12:24 ` gcc.hall at gmail dot com
0 siblings, 2 replies; 3+ messages in thread
From: gcc.hall at gmail dot com @ 2010-11-22 10:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599
Summary: Possible enhancement for inline stringops with -Os
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: other
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: gcc.hall@gmail.com
Host: Fedora 14
Target: Core i7
Build: GCC 4.5.1 20100924
GCC 4.5.1 20100924 "-Os -minline-all-stringops" on Core i7
int
main( int argc, char *argv[] )
{
int i, a[256], b[256];
for( i = 0; i < 256; ++i ) // discourage optimization
a[i] = rand();
memcpy( b, a, argc * sizeof(int) );
printf( "%d\n", b[rand()] ); // discourage optimization
return 0;
}
I wonder if its possible to improve the -Os code generation for inline
stringops when
the length is known to be a multiple of 4 bytes?
That is, instead of:
movsx rcx, ebp # argc
sal rcx, 2
rep movsb
it would be nice to see:
movsx rcx, ebp # argc
rep movsd
Note that memcpy( b, a, 1024 ) generates:
mov ecx, 256
rep movsd
This is for -Os which normally emits a movs, not a loop. The same applies to
stos.
The reason I think this might be possible is this:-
Use -mstringop-strategy=rep_4byte to force the use of movsd.
For memcpy( b, a, argc * sizeof(int) ) we get:
movsx rcx, ebp # argc
sal rcx, 2
cmp rcx, 4
jb .L5 #,
shr rcx, 2
rep movsd
.L5:
For memcpy( b, a, argc ) we get:
movsx rax, ebp # argc, argc
mov rdi, rsp # tmp76,
lea rsi, [rsp+1024] # tmp77,
cmp rax, 4 # argc,
jb .L3 #,
mov rcx, rax # tmp78, argc
shr rcx, 2 # tmp78,
rep movsd
.L3:
xor edx, edx # tmp80
test al, 2 # argc,
je .L4 #,
mov dx, WORD PTR [rsi] # tmp82,
mov WORD PTR [rdi], dx #, tmp82
mov edx, 2 # tmp80,
.L4:
test al, 1 # argc,
je .L5 #,
mov al, BYTE PTR [rsi+rdx] # tmp85,
mov BYTE PTR [rdi+rdx], al #, tmp85
.L5:
In the former case "memcpy(b, a, argc * sizeof(int))" gcc has omitted all the
code do deal with 1,
2, and 3 bytes so the stringop code generation has apparently spotted that the
length
is a multiple of 4 bytes.
I can see that the expression code for the length is separate from the stringop
stuff. Though it does do the right thing with a literal.
Incidentally, for the second case, memcpy( b, a, argc ), the Visual Studio
compiler generates code like this:
mov eax, ecx
shr ecx, 2
rep movsd
mov ecx, eax
and ecx, 3
rep movsb
which seems cleaner (no jumps) than the GCC code, though knowing GCC there is
probably a good reason for its choice as it generally seems to have a far more
sophisticated optimizer.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug other/46599] Possible enhancement for inline stringops with -Os
2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
@ 2010-11-22 11:15 ` rguenth at gcc dot gnu.org
2010-11-22 12:24 ` gcc.hall at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 11:15 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 11:10:53 UTC ---
-minline-all-stringops isn't supposed to be used (it's for debugging), and
probably doesn't mix well with -Os anyway.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug other/46599] Possible enhancement for inline stringops with -Os
2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
@ 2010-11-22 12:24 ` gcc.hall at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: gcc.hall at gmail dot com @ 2010-11-22 12:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599
--- Comment #2 from Jeremy <gcc.hall at gmail dot com> 2010-11-22 12:22:48 UTC ---
(In reply to comment #1)
> -minline-all-stringops isn't supposed to be used (it's for debugging), and
> probably doesn't mix well with -Os anyway.
OK thanks. I think in this context its a red herring as I get identical
results without it for the test program.
In my real app, it only seems to add cmpsb and doesn't affect movs, stos, or
scas anyway.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-11-22 12:22 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
2010-11-22 12:24 ` gcc.hall at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).