public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* x86 gcc lacks simple optimization
@ 2013-12-06  8:31 Konstantin Vladimirov
  2013-12-06  9:28 ` David Brown
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Konstantin Vladimirov @ 2013-12-06  8:31 UTC (permalink / raw)
  To: gcc, gcc-help

Hi,

Consider code:

int foo(char *t, char *v, int w)
{
int i;

for (i = 1; i != w; ++i)
{
int x = i << 2;
v[x + 4] = t[x + 4];
}

return 0;
}

Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:

gcc -O2 -m32 -S test.c

You will see loop, formed like:

.L5:
leal 0(,%eax,4), %edx
addl $1, %eax
movzbl 4(%edi,%edx), %ecx
cmpl %ebx, %eax
movb %cl, 4(%esi,%edx)
jne .L5

But it can be easily simplified to something like this:

.L5:
addl $1, %eax
movzbl (%esi,%eax,4), %edx
cmpl %ecx, %eax
movb %dl, (%ebx,%eax,4)
jne .L5

(i.e. left shift may be moved to address).

First question to gcc-help maillist. May be there are some options,
that I've missed, and there IS a way to explain gcc my intention to do
this?

And second question to gcc developers mail list. I am working on
private backend and want to add this optimization to my backend. What
do you advise me to do -- custom gimple pass, or rtl pass, or modify
some existent pass, etc?

---
With best regards, Konstantin

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: x86 gcc lacks simple optimization
@ 2013-12-06  9:40 Konstantin Vladimirov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Vladimirov @ 2013-12-06  9:40 UTC (permalink / raw)
  To: David Brown; +Cc: gcc-help, gcc

Hi,

Example from x86 code was only for ease of reproduction. I am pretty
sure, this is architecture-independent issue. Say on ARM:

.L2:
mov ip, r3, asl #2
add ip, ip, #4
add r3, r3, #1
ldrb r4, [r0, ip] @ zero_extendqisi2
cmp r3, r2
strb r4, [r1, ip]
bne .L2

May be improved to:

.L2:
add r3, r3, #1
ldrb ip, [r0, r3, asl #2] @ zero_extendqisi2
cmp r3, r2
strb ip, [r1, r3, asl #2]
bne .L2

And so on. I myself feeling more comfortable with x86, but it is only
a matter of taste.

To get improved version of code, I just do by hands what compiler is
expected to do automatically, i.e. rewritten things as:

int foo(char *t, char *v, int w)
{
int i;

for (i = 1; i != w; ++i)
{
v[(i + 1) << 2] = t[(i + 1) << 2];
}

return 0;
}

Private backend, I am working on isn't a modification of any, it is
private backend, written from scratch.

---
With best regards, Konstantin

On Fri, Dec 6, 2013 at 1:27 PM, David Brown <david@westcontrol.com> wrote:
> On 06/12/13 09:30, Konstantin Vladimirov wrote:
>> Hi,
>>
>> Consider code:
>>
>> int foo(char *t, char *v, int w)
>> {
>> int i;
>>
>> for (i = 1; i != w; ++i)
>> {
>> int x = i << 2;
>> v[x + 4] = t[x + 4];
>> }
>>
>> return 0;
>> }
>>
>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>>
>> gcc -O2 -m32 -S test.c
>>
>> You will see loop, formed like:
>>
>> .L5:
>> leal 0(,%eax,4), %edx
>> addl $1, %eax
>> movzbl 4(%edi,%edx), %ecx
>> cmpl %ebx, %eax
>> movb %cl, 4(%esi,%edx)
>> jne .L5
>>
>> But it can be easily simplified to something like this:
>>
>> .L5:
>> addl $1, %eax
>> movzbl (%esi,%eax,4), %edx
>> cmpl %ecx, %eax
>> movb %dl, (%ebx,%eax,4)
>> jne .L5
>>
>> (i.e. left shift may be moved to address).
>>
>> First question to gcc-help maillist. May be there are some options,
>> that I've missed, and there IS a way to explain gcc my intention to do
>> this?
>>
>> And second question to gcc developers mail list. I am working on
>> private backend and want to add this optimization to my backend. What
>> do you advise me to do -- custom gimple pass, or rtl pass, or modify
>> some existent pass, etc?
>>
>
> Hi,
>
> Usually the gcc developers are not keen on emails going to both the help
> and development list - they prefer to keep them separate.
>
> My first thought when someone finds a "missed optimisation" issue,
> especially with the x86 target, is are you /sure/ this code is slower?
> x86 chips are immensely complex, and the interplay between different
> instructions, pipelines, superscaling, etc., means that code that might
> appear faster, can actually be slower.  So please check your
> architecture flags (i.e., are you optimising for the "native" cpu, or
> any other specific cpu - optimised code can be different for different
> x86 cpus).  Then /measure/ the speed of the code to see if there is a
> real difference.
>
>
> Regarding your "private backend" - is this a modification of the x86
> backend, or a completely different target?  If it is x86, then I think
> the answer is "don't do it - work with the mainline code".  If it is
> something else, then an x86-specific optimisation is of little use anyway.
>
> mvh.,
>
> David
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-12-06 20:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-06  8:31 x86 gcc lacks simple optimization Konstantin Vladimirov
2013-12-06  9:28 ` David Brown
2013-12-06 10:09 ` Jakub Jelinek
2013-12-06 10:10 ` Richard Biener
2013-12-06 10:19   ` Konstantin Vladimirov
2013-12-06 10:25     ` Richard Biener
2013-12-06 11:31       ` H.J. Lu
2013-12-06 13:52       ` Konstantin Vladimirov
2013-12-06 14:17         ` Richard Biener
2013-12-06 20:54           ` Jeff Law
2013-12-06 12:13 ` Marc Glisse
2013-12-06  9:40 Konstantin Vladimirov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).