From: "Fredrik Hederstierna" <fredrik.hederstierna@verisure.com>
To: gcc@gcc.gnu.org
Subject: Question about loop induction variables
Date: Sun, 07 May 2017 09:23:00 -0000 [thread overview]
Message-ID: <OF3A7902EE.B4E6CF52-ON00258119.0031F221-00258119.00338AF1@notes.na.collabserv.com> (raw)
In-Reply-To:
Hi,
I have a question about loop induction variables, related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213
Consider a simple loop like
int ix;
for (ix = 0; ix < 6; ix++) {
data[ix] = ix;
}
In this case variable 'ix' is used as counting variable for array index,
but also used as value for storage, so in some sense its used in two different "ways".
Several target architectures might have support to auto-increment pointer registers when storing or reading data,
but if same variable is used as both array index to pointer and as data it might be more complicated?
In the example above, it seems like GCC loop analyzer (for completely peeling) thinks 'ix' is an induction variable that can be folded away:
from 'tree-ssa-loop-ivcanon.c':
Loop 1 iterates 5 times.
Loop 1 iterates at most 5 times.
Estimating sizes for loop 1
BB: 3, after_exit: 0
size: 0 _4 = (char) i_9;
Induction variable computation will be folded away.
size: 1 data[i_9] = _4;
size: 1 i_6 = i_9 + 1; <----- OK?
Induction variable computation will be folded away.
size: 1 ivtmp_7 = ivtmp_1 - 1;
Induction variable computation will be folded away.
size: 2 if (ivtmp_7 != 0)
Exit condition will be eliminated in peeled copies.
BB: 4, after_exit: 1
size: 5-4, last_iteration: 5-2
Loop size: 5
Estimated size after unrolling: 5
Then index 'ix' is considered to be a induction variable, but possibly it cannot be simply folded since its used in other ways as well?
Though completely-peeling loop resulted in longer code:
int i;
char _4;
unsigned int ivtmp_7;
char _12;
unsigned int ivtmp_15;
char _19;
unsigned int ivtmp_22;
char _26;
unsigned int ivtmp_29;
char _33;
unsigned int ivtmp_36;
char _40;
unsigned int ivtmp_43;
<bb 2>:
_12 = 0;
data[0] = _12;
i_14 = 1;
ivtmp_15 = 5;
_19 = (char) i_14;
data[i_14] = _19;
i_21 = i_14 + 1;
ivtmp_22 = ivtmp_15 + 4294967295;
_26 = (char) i_21;
data[i_21] = _26;
i_28 = i_21 + 1;
ivtmp_29 = ivtmp_22 + 4294967295;
_33 = (char) i_28;
data[i_28] = _33;
i_35 = i_28 + 1;
ivtmp_36 = ivtmp_29 + 4294967295;
_40 = (char) i_35;
data[i_35] = _40;
i_42 = i_35 + 1;
ivtmp_43 = ivtmp_36 + 4294967295;
_4 = (char) i_42;
data[i_42] = _4;
i_6 = i_42 + 1;
ivtmp_7 = ivtmp_43 + 4294967295;
return;
instead of original and shorter
int i;
unsigned int ivtmp_1;
char _4;
unsigned int ivtmp_7;
<bb 2>:
<bb 3>:
# i_9 = PHI <i_6(4), 0(2)>
# ivtmp_1 = PHI <ivtmp_7(4), 6(2)>
_4 = (char) i_9;
data[i_9] = _4;
i_6 = i_9 + 1;
ivtmp_7 = ivtmp_1 - 1;
if (ivtmp_7 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
goto <bb 3>;
<bb 5>:
return;
Example for ARM target machine code it became longer:
0000001c <test_iter_6>:
1c: e59f3030 ldr r3, [pc, #48] ; 54 <test_iter_6+0x38>
20: e3a02000 mov r2, #0
24: e5c32000 strb r2, [r3]
28: e3a02001 mov r2, #1
2c: e5c32001 strb r2, [r3, #1]
30: e3a02002 mov r2, #2
34: e5c32002 strb r2, [r3, #2]
38: e3a02003 mov r2, #3
3c: e5c32003 strb r2, [r3, #3]
40: e3a02004 mov r2, #4
44: e5c32004 strb r2, [r3, #4]
48: e3a02005 mov r2, #5
4c: e5c32005 strb r2, [r3, #5]
50: e12fff1e bx lr
54: 00000000 .word 0x00000000
compared to if complete-loop-peeling was not done
0000001c <test_iter_6>:
1c: e59f2014 ldr r2, [pc, #20] ; 38 <test_iter_6+0x1c>
20: e3a03000 mov r3, #0
24: e7c33002 strb r3, [r3, r2]
28: e2833001 add r3, r3, #1
2c: e3530006 cmp r3, #6
30: 1afffffb bne 24 <test_iter_6+0x8>
34: e12fff1e bx lr
38: 00000000 .word 0x00000000
Producing 15 instead of 8 words, giving ~100% larger code size.
And same for x86 arch:
f: c6 05 00 00 00 00 00 movb $0x0,0x0(%rip) # 16
<test_iter_6+0x7>
16: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 1d
<test_iter_6+0xe>
1d: c6 05 00 00 00 00 02 movb $0x2,0x0(%rip) # 24
<test_iter_6+0x15>
24: c6 05 00 00 00 00 03 movb $0x3,0x0(%rip) # 2b
<test_iter_6+0x1c>
2b: c6 05 00 00 00 00 04 movb $0x4,0x0(%rip) # 32
<test_iter_6+0x23>
32: c6 05 00 00 00 00 05 movb $0x5,0x0(%rip) # 39
<test_iter_6+0x2a>
39: c3 retq
Without complete-loop-peeling done:
f: 31 c0 xor %eax,%eax
11: 88 80 00 00 00 00 mov %al,0x0(%rax)
17: 48 ff c0 inc %rax
1a: 48 83 f8 06 cmp $0x6,%rax
1e: 75 f1 jne 11 <test_iter_6+0x2>
20: c3 retq
Producing 43 instead of 18 bytes, ~140% bigger code size.
Should a variable be considered to be induction variable also when its used both as LHS array index and RHS data value in same loop?
Since its cross-target for both ARM and x86, does it origin in some cost estimation on how 'ix' or other induction variables will be folded or not in the final target code?
Thanks, Kind Regards,
Fredrik
next reply other threads:[~2017-05-07 9:23 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-07 9:23 Fredrik Hederstierna [this message]
2017-05-08 7:15 ` Richard Biener via gcc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OF3A7902EE.B4E6CF52-ON00258119.0031F221-00258119.00338AF1@notes.na.collabserv.com \
--to=fredrik.hederstierna@verisure.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).