* m68k optimisations?
@ 2013-11-08 11:08 Fredrik Olsson
2013-12-05 0:48 ` Maxim Kuvyrkov
0 siblings, 1 reply; 2+ messages in thread
From: Fredrik Olsson @ 2013-11-08 11:08 UTC (permalink / raw)
To: gcc
I have this simple functions:
int sum_vec(int c, ...) {
va_list argptr;
va_start(argptr, c);
int sum = 0;
while (c--) {
int x = va_arg(argptr, int);
sum += x;
}
va_end(argptr);
return sum;
}
When compiling with "-fomit-frame-pointer -Os -march=68000 -c -S
-mshort" I get this assembly (I have manually added comments with
clock cycles per instruction and a total for a count of 0, 8 and n>0):
.even
.globl _sum_vec
_sum_vec:
lea (6,%sp),%a0 | 8
move.w 4(%sp),%d1 | 12
clr.w %d0 | 4
jra .L1 | 12
.L2:
add.w (%a0)+,%d0 | 8
.L1:
dbra %d1,.L2 | 16,12
rts | 16
| c==0: 8+12+4+12+12+16=64
| c==8: 8+12+4+12+(16+8)*8+12+16=256
| c==n: =64+24n
When instead compiling with "-fomit-frame-pointer -O3 -march=68000 -c
-S -mshort" I expect to get more aggressive optimisation than -Os, or
at least just as performant, but instead I get this:
.even
.globl _sum_vec
_sum_vec:
move.w 4(%sp),%d0 | 12
jeq .L2 | 12,8
lea (6,%sp),%a0 | 8
subq.w #1,%d0 | 4
and.l #65535,%d0 | 16
add.l %d0,%d0 | 8
lea 8(%sp,%d0.l),%a1 | 16
clr.w %d0 | 4
.L1:
add.w (%a0)+,%d0 | 8
cmp.l %a0,%a1 | 8
jne .L1 | 12|8
rts | 16
.L2:
clr.w %d0 | 4
rts | 16
| c==0: 12+12+4+16=44
| c==8: 12+8+8+4+16+8+16+4+(8+8+12)*4-4+16=316
| c==n: =88+28n
The count==0 case is better. I can see what optimisation has been
tried for the loop, but it just not working since both the ini for the
loop and the loop itself becomes more costly.
Being a GCC beginner I would like a few pointers as to how I should go
about to fix this?
// Fredrik
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: m68k optimisations?
2013-11-08 11:08 m68k optimisations? Fredrik Olsson
@ 2013-12-05 0:48 ` Maxim Kuvyrkov
0 siblings, 0 replies; 2+ messages in thread
From: Maxim Kuvyrkov @ 2013-12-05 0:48 UTC (permalink / raw)
To: Fredrik Olsson; +Cc: gcc
On 9/11/2013, at 12:08 am, Fredrik Olsson <peylow@gmail.com> wrote:
> I have this simple functions:
> int sum_vec(int c, ...) {
> va_list argptr;
> va_start(argptr, c);
> int sum = 0;
> while (c--) {
> int x = va_arg(argptr, int);
> sum += x;
> }
> va_end(argptr);
> return sum;
> }
>
>
> When compiling with "-fomit-frame-pointer -Os -march=68000 -c -S
> -mshort" I get this assembly (I have manually added comments with
> clock cycles per instruction and a total for a count of 0, 8 and n>0):
> .even
> .globl _sum_vec
> _sum_vec:
> lea (6,%sp),%a0 | 8
> move.w 4(%sp),%d1 | 12
> clr.w %d0 | 4
> jra .L1 | 12
> .L2:
> add.w (%a0)+,%d0 | 8
> .L1:
> dbra %d1,.L2 | 16,12
> rts | 16
> | c==0: 8+12+4+12+12+16=64
> | c==8: 8+12+4+12+(16+8)*8+12+16=256
> | c==n: =64+24n
>
> When instead compiling with "-fomit-frame-pointer -O3 -march=68000 -c
> -S -mshort" I expect to get more aggressive optimisation than -Os, or
> at least just as performant, but instead I get this:
> .even
> .globl _sum_vec
> _sum_vec:
> move.w 4(%sp),%d0 | 12
> jeq .L2 | 12,8
> lea (6,%sp),%a0 | 8
> subq.w #1,%d0 | 4
> and.l #65535,%d0 | 16
> add.l %d0,%d0 | 8
> lea 8(%sp,%d0.l),%a1 | 16
> clr.w %d0 | 4
> .L1:
> add.w (%a0)+,%d0 | 8
> cmp.l %a0,%a1 | 8
> jne .L1 | 12|8
> rts | 16
> .L2:
> clr.w %d0 | 4
> rts | 16
> | c==0: 12+12+4+16=44
> | c==8: 12+8+8+4+16+8+16+4+(8+8+12)*4-4+16=316
> | c==n: =88+28n
>
> The count==0 case is better. I can see what optimisation has been
> tried for the loop, but it just not working since both the ini for the
> loop and the loop itself becomes more costly.
>
> Being a GCC beginner I would like a few pointers as to how I should go
> about to fix this?
You investigate such problems by comparing intermediate debug dumps of two compilation scenarios; by the assembly time it is almost impossible to guess where the problem is coming from. Add -fdump-tree-all and -fdump-rtl-all to the compilation flags and find which optimization pass makes the wrong decision. Then you trace that optimization pass or file a bug report in hopes that someone (optimization maintainer) will look at it.
Read through GCC wiki for information on debugging and troubleshooting GCC:
- http://gcc.gnu.org/wiki/GettingStarted
- http://gcc.gnu.org/wiki/FAQ
- http://gcc.gnu.org/wiki/
Thanks,
--
Maxim Kuvyrkov
www.kugelworks.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-12-05 0:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-08 11:08 m68k optimisations? Fredrik Olsson
2013-12-05 0:48 ` Maxim Kuvyrkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).