public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Surprisingly bad code generated near char*
@ 2016-08-18  8:46 Avi Kivity
  2016-08-18 12:33 ` Manuel López-Ibáñez
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18  8:46 UTC (permalink / raw)
  To: gcc-help

I wanted to test how restrict helps in code generation.  I started with 
this example:


   struct s { int a; int b; };

   inline
   void encode(int a, char* p) {
     for (unsigned i = 0; i < sizeof(a); ++i) {
       p[i] = reinterpret_cast<const char*>(&a)[i];
     }
   }

   void f(s* x, char* p) {
     encode(x->a, p + 0);
     encode(x->b, p + 4);
   }


simulating serialization code.  My expectations were that without 
__restrict, I'd have four instructions:


    mov (%rdi), %eax

    mov %eax, %(rsi)

    mov 4(%rdi), %eax

    mov %eax, 4(%rsi)


while x and p can alias, a and p cannot, because a is a local variable.  
I further hoped that adding __restrict would remove two instructions:


    mov (%rdi), %rax

    mov %rax, (%rsi)


since the compiler now knows that x and p do not alias.


However, the generated code is much poorer than this (-O2):


    0:    8b 07                    mov    (%rdi),%eax
    2:    89 c1                    mov    %eax,%ecx
    4:    88 06                    mov    %al,(%rsi)
    6:    66 c1 e9 08              shr    $0x8,%cx
    a:    88 4e 01                 mov    %cl,0x1(%rsi)
    d:    89 c1                    mov    %eax,%ecx
    f:    c1 e8 18                 shr    $0x18,%eax
   12:    c1 e9 10                 shr    $0x10,%ecx
   15:    88 46 03                 mov    %al,0x3(%rsi)
   18:    88 4e 02                 mov    %cl,0x2(%rsi)
   1b:    8b 47 04                 mov    0x4(%rdi),%eax
   1e:    89 c7                    mov    %eax,%edi
   20:    89 c1                    mov    %eax,%ecx
   22:    88 46 04                 mov    %al,0x4(%rsi)
   25:    66 c1 ef 08              shr    $0x8,%di
   29:    c1 e9 10                 shr    $0x10,%ecx
   2c:    c1 e8 18                 shr    $0x18,%eax
   2f:    40 88 7e 05              mov    %dil,0x5(%rsi)
   33:    88 4e 06                 mov    %cl,0x6(%rsi)
   36:    88 46 07                 mov    %al,0x7(%rsi)


gcc doesn't even recognize the idiom of writing a word's four bytes 
sequentially.  With -O3, there is some improvement:


    0:    8b 07                    mov    (%rdi),%eax
    2:    89 06                    mov    %eax,(%rsi)
    4:    8b 47 04                 mov    0x4(%rdi),%eax
    7:    89 c1                    mov    %eax,%ecx
    9:    88 46 04                 mov    %al,0x4(%rsi)
    c:    66 c1 e9 08              shr    $0x8,%cx
   10:    88 4e 05                 mov    %cl,0x5(%rsi)
   13:    89 c1                    mov    %eax,%ecx
   15:    c1 e8 18                 shr    $0x18,%eax
   18:    c1 e9 10                 shr    $0x10,%ecx
   1b:    88 46 07                 mov    %al,0x7(%rsi)
   1e:    88 4e 06                 mov    %cl,0x6(%rsi)

the copy of the first word is optimized, but the second one is not, even 
though they're exactly the same.


Adding __restrict did not help.


Is this a problem in gcc, or are my expectations incorrect? I'm 
particularly worried that gcc recognized the copy idiom, but did not 
apply it to the second word, and required -O3 to optimize it.


Using std::copy_n() helped, but __restrict did not:


    0:    8b 07                    mov    (%rdi),%eax
    2:    89 06                    mov    %eax,(%rsi)
    4:    8b 47 04                 mov    0x4(%rdi),%eax
    7:    89 46 04                 mov    %eax,0x4(%rsi)

so the optimization opportunity is still missed.


gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-18  8:46 Surprisingly bad code generated near char* Avi Kivity
@ 2016-08-18 12:33 ` Manuel López-Ibáñez
  2016-08-18 13:35   ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Manuel López-Ibáñez @ 2016-08-18 12:33 UTC (permalink / raw)
  To: Avi Kivity, GCC-help

On 18/08/16 09:45, Avi Kivity wrote:
> I wanted to test how restrict helps in code generation.  I started with this
> example:

There are quite a few number of missed-optimizations with restrict: 
https://gcc.gnu.org/PR49774

If you issue is not in that list, you may wish to open a new PR.

Cheers,
	Manuel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-18 12:33 ` Manuel López-Ibáñez
@ 2016-08-18 13:35   ` Avi Kivity
  2016-08-18 13:41     ` Andrew Haley
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18 13:35 UTC (permalink / raw)
  To: Manuel López-Ibáñez, GCC-help



On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
> On 18/08/16 09:45, Avi Kivity wrote:
>> I wanted to test how restrict helps in code generation.  I started 
>> with this
>> example:
>
> There are quite a few number of missed-optimizations with restrict: 
> https://gcc.gnu.org/PR49774
>
> If you issue is not in that list, you may wish to open a new PR.
>

Here, the missed optimizations started even before I started playing 
with restrict.  But I'll see if I need to file the last one.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-18 13:35   ` Avi Kivity
@ 2016-08-18 13:41     ` Andrew Haley
  2016-08-18 18:55       ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Haley @ 2016-08-18 13:41 UTC (permalink / raw)
  To: Avi Kivity, Manuel López-Ibáñez, GCC-help

On 18/08/16 14:35, Avi Kivity wrote:
>
> On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
>> On 18/08/16 09:45, Avi Kivity wrote:
>>> I wanted to test how restrict helps in code generation.  I started 
>>> with this
>>> example:
>>
>> There are quite a few number of missed-optimizations with restrict: 
>> https://gcc.gnu.org/PR49774
>>
>> If you issue is not in that list, you may wish to open a new PR.
> 
> Here, the missed optimizations started even before I started playing 
> with restrict.  But I'll see if I need to file the last one.

I haven't been able to duplicate this behaviour on any GCC to which I
have access.

Andrew.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-18 13:41     ` Andrew Haley
@ 2016-08-18 18:55       ` Avi Kivity
  2016-08-19  7:32         ` Andrew Haley
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18 18:55 UTC (permalink / raw)
  To: Andrew Haley, Manuel López-Ibáñez, GCC-help

On 08/18/2016 04:41 PM, Andrew Haley wrote:
> On 18/08/16 14:35, Avi Kivity wrote:
>> On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
>>> On 18/08/16 09:45, Avi Kivity wrote:
>>>> I wanted to test how restrict helps in code generation.  I started
>>>> with this
>>>> example:
>>> There are quite a few number of missed-optimizations with restrict:
>>> https://gcc.gnu.org/PR49774
>>>
>>> If you issue is not in that list, you may wish to open a new PR.
>> Here, the missed optimizations started even before I started playing
>> with restrict.  But I'll see if I need to file the last one.
> I haven't been able to duplicate this behaviour on any GCC to which I
> have access.
>

I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3 
is fine):

$ gcc --version
gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cat restrict.cc
   struct s { int a; int b; };

   inline
   void encode(int a, char* p) {
     for (unsigned i = 0; i < sizeof(a); ++i) {
       p[i] = reinterpret_cast<const char*>(&a)[i];
     }
   }

   void f(s* x, char* p) {
     encode(x->a, p + 0);
     encode(x->b, p + 4);
   }

$ g++ -O2 -c restrict.cc

$ objdump -SrC restrict.o

restrict.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <f(s*, char*)>:
    0:    8b 07                    mov    (%rdi),%eax
    2:    89 44 24 fc              mov    %eax,-0x4(%rsp)
    6:    31 c0                    xor    %eax,%eax
    8:    0f b6 54 04 fc           movzbl -0x4(%rsp,%rax,1),%edx
    d:    88 14 06                 mov    %dl,(%rsi,%rax,1)
   10:    48 83 c0 01              add    $0x1,%rax
   14:    48 83 f8 04              cmp    $0x4,%rax
   18:    75 ee                    jne    8 <f(s*, char*)+0x8>
   1a:    8b 47 04                 mov    0x4(%rdi),%eax
   1d:    89 c1                    mov    %eax,%ecx
   1f:    88 46 04                 mov    %al,0x4(%rsi)
   22:    66 c1 e9 08              shr    $0x8,%cx
   26:    88 4e 05                 mov    %cl,0x5(%rsi)
   29:    89 c1                    mov    %eax,%ecx
   2b:    c1 e8 18                 shr    $0x18,%eax
   2e:    c1 e9 10                 shr    $0x10,%ecx
   31:    88 46 07                 mov    %al,0x7(%rsi)
   34:    88 4e 06                 mov    %cl,0x6(%rsi)
   37:    c3                       retq


$ g++ -O3 -c restrict.cc

$ objdump -SrC restrict.o

restrict.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <f(s*, char*)>:
    0:    8b 07                    mov    (%rdi),%eax
    2:    89 06                    mov    %eax,(%rsi)
    4:    8b 47 04                 mov    0x4(%rdi),%eax
    7:    89 46 04                 mov    %eax,0x4(%rsi)
    a:    c3                       retq


Adding __restrict did not improve -O3 code generation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-18 18:55       ` Avi Kivity
@ 2016-08-19  7:32         ` Andrew Haley
  2016-08-19  7:39           ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Haley @ 2016-08-19  7:32 UTC (permalink / raw)
  To: Avi Kivity, Manuel López-Ibáñez, GCC-help

On 18/08/16 19:54, Avi Kivity wrote:
> I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3 
> is fine):

So -O3 is fine; O2 isn't doing the analysis to merge the stores.  No bug.

Andrew.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Surprisingly bad code generated near char*
  2016-08-19  7:32         ` Andrew Haley
@ 2016-08-19  7:39           ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2016-08-19  7:39 UTC (permalink / raw)
  To: Andrew Haley, Manuel López-Ibáñez, GCC-help

On 08/19/2016 10:32 AM, Andrew Haley wrote:
> On 18/08/16 19:54, Avi Kivity wrote:
>> I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3
>> is fine):
> So -O3 is fine; O2 isn't doing the analysis to merge the stores.  No bug.

Ok.  5.3.1 missed it on -O3 too, but as it's just a missed optimization, 
it's not too bad.

I'll file a bug for the missed optimization with __restrict and -O3.

> Andrew.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-19  7:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18  8:46 Surprisingly bad code generated near char* Avi Kivity
2016-08-18 12:33 ` Manuel López-Ibáñez
2016-08-18 13:35   ` Avi Kivity
2016-08-18 13:41     ` Andrew Haley
2016-08-18 18:55       ` Avi Kivity
2016-08-19  7:32         ` Andrew Haley
2016-08-19  7:39           ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).