* Surprisingly bad code generated near char*
@ 2016-08-18 8:46 Avi Kivity
2016-08-18 12:33 ` Manuel López-Ibáñez
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18 8:46 UTC (permalink / raw)
To: gcc-help
I wanted to test how restrict helps in code generation. I started with
this example:
struct s { int a; int b; };
inline
void encode(int a, char* p) {
for (unsigned i = 0; i < sizeof(a); ++i) {
p[i] = reinterpret_cast<const char*>(&a)[i];
}
}
void f(s* x, char* p) {
encode(x->a, p + 0);
encode(x->b, p + 4);
}
simulating serialization code. My expectations were that without
__restrict, I'd have four instructions:
mov (%rdi), %eax
mov %eax, %(rsi)
mov 4(%rdi), %eax
mov %eax, 4(%rsi)
while x and p can alias, a and p cannot, because a is a local variable.
I further hoped that adding __restrict would remove two instructions:
mov (%rdi), %rax
mov %rax, (%rsi)
since the compiler now knows that x and p do not alias.
However, the generated code is much poorer than this (-O2):
0: 8b 07 mov (%rdi),%eax
2: 89 c1 mov %eax,%ecx
4: 88 06 mov %al,(%rsi)
6: 66 c1 e9 08 shr $0x8,%cx
a: 88 4e 01 mov %cl,0x1(%rsi)
d: 89 c1 mov %eax,%ecx
f: c1 e8 18 shr $0x18,%eax
12: c1 e9 10 shr $0x10,%ecx
15: 88 46 03 mov %al,0x3(%rsi)
18: 88 4e 02 mov %cl,0x2(%rsi)
1b: 8b 47 04 mov 0x4(%rdi),%eax
1e: 89 c7 mov %eax,%edi
20: 89 c1 mov %eax,%ecx
22: 88 46 04 mov %al,0x4(%rsi)
25: 66 c1 ef 08 shr $0x8,%di
29: c1 e9 10 shr $0x10,%ecx
2c: c1 e8 18 shr $0x18,%eax
2f: 40 88 7e 05 mov %dil,0x5(%rsi)
33: 88 4e 06 mov %cl,0x6(%rsi)
36: 88 46 07 mov %al,0x7(%rsi)
gcc doesn't even recognize the idiom of writing a word's four bytes
sequentially. With -O3, there is some improvement:
0: 8b 07 mov (%rdi),%eax
2: 89 06 mov %eax,(%rsi)
4: 8b 47 04 mov 0x4(%rdi),%eax
7: 89 c1 mov %eax,%ecx
9: 88 46 04 mov %al,0x4(%rsi)
c: 66 c1 e9 08 shr $0x8,%cx
10: 88 4e 05 mov %cl,0x5(%rsi)
13: 89 c1 mov %eax,%ecx
15: c1 e8 18 shr $0x18,%eax
18: c1 e9 10 shr $0x10,%ecx
1b: 88 46 07 mov %al,0x7(%rsi)
1e: 88 4e 06 mov %cl,0x6(%rsi)
the copy of the first word is optimized, but the second one is not, even
though they're exactly the same.
Adding __restrict did not help.
Is this a problem in gcc, or are my expectations incorrect? I'm
particularly worried that gcc recognized the copy idiom, but did not
apply it to the second word, and required -O3 to optimize it.
Using std::copy_n() helped, but __restrict did not:
0: 8b 07 mov (%rdi),%eax
2: 89 06 mov %eax,(%rsi)
4: 8b 47 04 mov 0x4(%rdi),%eax
7: 89 46 04 mov %eax,0x4(%rsi)
so the optimization opportunity is still missed.
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-18 8:46 Surprisingly bad code generated near char* Avi Kivity
@ 2016-08-18 12:33 ` Manuel López-Ibáñez
2016-08-18 13:35 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Manuel López-Ibáñez @ 2016-08-18 12:33 UTC (permalink / raw)
To: Avi Kivity, GCC-help
On 18/08/16 09:45, Avi Kivity wrote:
> I wanted to test how restrict helps in code generation. I started with this
> example:
There are quite a few number of missed-optimizations with restrict:
https://gcc.gnu.org/PR49774
If you issue is not in that list, you may wish to open a new PR.
Cheers,
Manuel.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-18 12:33 ` Manuel López-Ibáñez
@ 2016-08-18 13:35 ` Avi Kivity
2016-08-18 13:41 ` Andrew Haley
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18 13:35 UTC (permalink / raw)
To: Manuel López-Ibáñez, GCC-help
On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
> On 18/08/16 09:45, Avi Kivity wrote:
>> I wanted to test how restrict helps in code generation. I started
>> with this
>> example:
>
> There are quite a few number of missed-optimizations with restrict:
> https://gcc.gnu.org/PR49774
>
> If you issue is not in that list, you may wish to open a new PR.
>
Here, the missed optimizations started even before I started playing
with restrict. But I'll see if I need to file the last one.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-18 13:35 ` Avi Kivity
@ 2016-08-18 13:41 ` Andrew Haley
2016-08-18 18:55 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Haley @ 2016-08-18 13:41 UTC (permalink / raw)
To: Avi Kivity, Manuel López-Ibáñez, GCC-help
On 18/08/16 14:35, Avi Kivity wrote:
>
> On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
>> On 18/08/16 09:45, Avi Kivity wrote:
>>> I wanted to test how restrict helps in code generation. I started
>>> with this
>>> example:
>>
>> There are quite a few number of missed-optimizations with restrict:
>> https://gcc.gnu.org/PR49774
>>
>> If you issue is not in that list, you may wish to open a new PR.
>
> Here, the missed optimizations started even before I started playing
> with restrict. But I'll see if I need to file the last one.
I haven't been able to duplicate this behaviour on any GCC to which I
have access.
Andrew.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-18 13:41 ` Andrew Haley
@ 2016-08-18 18:55 ` Avi Kivity
2016-08-19 7:32 ` Andrew Haley
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2016-08-18 18:55 UTC (permalink / raw)
To: Andrew Haley, Manuel López-Ibáñez, GCC-help
On 08/18/2016 04:41 PM, Andrew Haley wrote:
> On 18/08/16 14:35, Avi Kivity wrote:
>> On 08/18/2016 03:33 PM, Manuel López-Ibáñez wrote:
>>> On 18/08/16 09:45, Avi Kivity wrote:
>>>> I wanted to test how restrict helps in code generation. I started
>>>> with this
>>>> example:
>>> There are quite a few number of missed-optimizations with restrict:
>>> https://gcc.gnu.org/PR49774
>>>
>>> If you issue is not in that list, you may wish to open a new PR.
>> Here, the missed optimizations started even before I started playing
>> with restrict. But I'll see if I need to file the last one.
> I haven't been able to duplicate this behaviour on any GCC to which I
> have access.
>
I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3
is fine):
$ gcc --version
gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat restrict.cc
struct s { int a; int b; };
inline
void encode(int a, char* p) {
for (unsigned i = 0; i < sizeof(a); ++i) {
p[i] = reinterpret_cast<const char*>(&a)[i];
}
}
void f(s* x, char* p) {
encode(x->a, p + 0);
encode(x->b, p + 4);
}
$ g++ -O2 -c restrict.cc
$ objdump -SrC restrict.o
restrict.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f(s*, char*)>:
0: 8b 07 mov (%rdi),%eax
2: 89 44 24 fc mov %eax,-0x4(%rsp)
6: 31 c0 xor %eax,%eax
8: 0f b6 54 04 fc movzbl -0x4(%rsp,%rax,1),%edx
d: 88 14 06 mov %dl,(%rsi,%rax,1)
10: 48 83 c0 01 add $0x1,%rax
14: 48 83 f8 04 cmp $0x4,%rax
18: 75 ee jne 8 <f(s*, char*)+0x8>
1a: 8b 47 04 mov 0x4(%rdi),%eax
1d: 89 c1 mov %eax,%ecx
1f: 88 46 04 mov %al,0x4(%rsi)
22: 66 c1 e9 08 shr $0x8,%cx
26: 88 4e 05 mov %cl,0x5(%rsi)
29: 89 c1 mov %eax,%ecx
2b: c1 e8 18 shr $0x18,%eax
2e: c1 e9 10 shr $0x10,%ecx
31: 88 46 07 mov %al,0x7(%rsi)
34: 88 4e 06 mov %cl,0x6(%rsi)
37: c3 retq
$ g++ -O3 -c restrict.cc
$ objdump -SrC restrict.o
restrict.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f(s*, char*)>:
0: 8b 07 mov (%rdi),%eax
2: 89 06 mov %eax,(%rsi)
4: 8b 47 04 mov 0x4(%rdi),%eax
7: 89 46 04 mov %eax,0x4(%rsi)
a: c3 retq
Adding __restrict did not improve -O3 code generation.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-18 18:55 ` Avi Kivity
@ 2016-08-19 7:32 ` Andrew Haley
2016-08-19 7:39 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Haley @ 2016-08-19 7:32 UTC (permalink / raw)
To: Avi Kivity, Manuel López-Ibáñez, GCC-help
On 18/08/16 19:54, Avi Kivity wrote:
> I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3
> is fine):
So -O3 is fine; O2 isn't doing the analysis to merge the stores. No bug.
Andrew.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Surprisingly bad code generated near char*
2016-08-19 7:32 ` Andrew Haley
@ 2016-08-19 7:39 ` Avi Kivity
0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2016-08-19 7:39 UTC (permalink / raw)
To: Andrew Haley, Manuel López-Ibáñez, GCC-help
On 08/19/2016 10:32 AM, Andrew Haley wrote:
> On 18/08/16 19:54, Avi Kivity wrote:
>> I replicated it on Fedora 24's gcc; in fact -O2 code is worse (but -O3
>> is fine):
> So -O3 is fine; O2 isn't doing the analysis to merge the stores. No bug.
Ok. 5.3.1 missed it on -O3 too, but as it's just a missed optimization,
it's not too bad.
I'll file a bug for the missed optimization with __restrict and -O3.
> Andrew.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-08-19 7:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18 8:46 Surprisingly bad code generated near char* Avi Kivity
2016-08-18 12:33 ` Manuel López-Ibáñez
2016-08-18 13:35 ` Avi Kivity
2016-08-18 13:41 ` Andrew Haley
2016-08-18 18:55 ` Avi Kivity
2016-08-19 7:32 ` Andrew Haley
2016-08-19 7:39 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).