public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/46599] New: Possible enhancement for inline stringops with -Os
@ 2010-11-22 10:35 gcc.hall at gmail dot com
  2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
  2010-11-22 12:24 ` gcc.hall at gmail dot com
  0 siblings, 2 replies; 3+ messages in thread
From: gcc.hall at gmail dot com @ 2010-11-22 10:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599

           Summary: Possible enhancement for inline stringops with -Os
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: gcc.hall@gmail.com
              Host: Fedora 14
            Target: Core i7
             Build: GCC 4.5.1 20100924


GCC 4.5.1 20100924 "-Os -minline-all-stringops"  on Core i7

int
main( int argc, char *argv[] )
{
  int i, a[256], b[256];

  for( i = 0; i < 256; ++i )  // discourage optimization
    a[i] = rand();

  memcpy( b, a, argc * sizeof(int) );

  printf( "%d\n", b[rand()] );  // discourage optimization

  return 0;
}

I wonder if its possible to improve the -Os code generation for inline
stringops when
the length is known to be a multiple of 4 bytes?

That is, instead of:

    movsx   rcx, ebp    # argc
    sal rcx, 2
    rep movsb

it would be nice to see:

    movsx   rcx, ebp    # argc
    rep movsd

Note that  memcpy( b, a, 1024 ) generates:

    mov ecx, 256
    rep movsd

This is for -Os which normally emits a movs, not a loop.  The same applies to
stos.

The reason I think this might be possible is this:-

Use -mstringop-strategy=rep_4byte to force the use of movsd.

For memcpy( b, a, argc * sizeof(int) ) we get:

    movsx   rcx, ebp    # argc
    sal rcx, 2
    cmp rcx, 4
    jb  .L5 #,
    shr rcx, 2
    rep movsd
.L5:


For memcpy( b, a, argc ) we get:

    movsx   rax, ebp    # argc, argc
    mov rdi, rsp    # tmp76,
    lea rsi, [rsp+1024] # tmp77,
    cmp rax, 4  # argc,
    jb  .L3 #,
    mov rcx, rax    # tmp78, argc
    shr rcx, 2  # tmp78,
    rep movsd
.L3:
    xor edx, edx    # tmp80
    test    al, 2   # argc,
    je  .L4 #,
    mov dx, WORD PTR [rsi]  # tmp82,
    mov WORD PTR [rdi], dx  #, tmp82
    mov edx, 2  # tmp80,
.L4:
    test    al, 1   # argc,
    je  .L5 #,
    mov al, BYTE PTR [rsi+rdx]  # tmp85,
    mov BYTE PTR [rdi+rdx], al  #, tmp85
.L5:

In the former case "memcpy(b, a, argc * sizeof(int))" gcc has omitted all the
code do deal with 1,
2, and 3 bytes so the stringop code generation has apparently spotted that the
length
is a multiple of 4 bytes.

I can see that the expression code for the length is separate from the stringop
stuff.  Though it does do the right thing with a literal.

Incidentally, for the second case, memcpy( b, a, argc ), the Visual Studio
compiler generates code like this:

    mov eax, ecx
    shr ecx, 2
    rep movsd
    mov ecx, eax
    and ecx, 3
    rep movsb

which seems cleaner (no jumps) than the GCC code, though knowing GCC there is
probably a good reason for its choice as it generally seems to have a far more
sophisticated optimizer.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug other/46599] Possible enhancement for inline stringops with -Os
  2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
@ 2010-11-22 11:15 ` rguenth at gcc dot gnu.org
  2010-11-22 12:24 ` gcc.hall at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 11:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 11:10:53 UTC ---
-minline-all-stringops isn't supposed to be used (it's for debugging), and
probably doesn't mix well with -Os anyway.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug other/46599] Possible enhancement for inline stringops with -Os
  2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
  2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
@ 2010-11-22 12:24 ` gcc.hall at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: gcc.hall at gmail dot com @ 2010-11-22 12:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46599

--- Comment #2 from Jeremy <gcc.hall at gmail dot com> 2010-11-22 12:22:48 UTC ---
(In reply to comment #1)
> -minline-all-stringops isn't supposed to be used (it's for debugging), and
> probably doesn't mix well with -Os anyway.

OK thanks.  I think in this context its a red herring as I get identical
results without it for the test program.  

In my real app, it only seems to add cmpsb and doesn't affect movs, stos, or
scas anyway.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-22 12:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-22 10:35 [Bug other/46599] New: Possible enhancement for inline stringops with -Os gcc.hall at gmail dot com
2010-11-22 11:15 ` [Bug other/46599] " rguenth at gcc dot gnu.org
2010-11-22 12:24 ` gcc.hall at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).