public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter
@ 2012-11-01 10:08 olegendo at gcc dot gnu.org
2012-11-01 10:12 ` [Bug tree-optimization/55162] " olegendo at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-01 10:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
Bug #: 55162
Summary: Loop ivopts cuts off top bits of loop counter
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: olegendo@gcc.gnu.org
Target: sh*-*-*
The following function:
int test (int* x, unsigned int c)
{
int s = 0;
unsigned int i;
for (i = 0; i < c; ++i)
s += x[i];
return s;
}
compiled for SH (-O2 -m4 -ml) results in the following code:
tst r5,r5 // c == 0 ?
bt/s .L6
mov #0,r0
shll2 r5 // c <<= 2
add #-4,r5 // c += -4
shlr2 r5 // c >>= 2 (unsigned shift)
add #1,r5 // c += 1
.L3:
mov.l @r4+,r1
dt r5
bf/s .L3
add r1,r0
.L6:
rts
nop
If the function above is invoked with c = 0x80000000 the loop will do
0x40000000 number of iterations, which looks suspicious.
For example, passing a virtual address 0x00001000 and c = 0x80000000 to the
function should actually run over the address range 0x00001000 .. 0x80001000,
not 0x00001000 .. 0x40001000.
I've also checked this on ARM. There, the loop counter is transformed into the
end address and the loop compares the addresses instead of using a
decrement-and-test insn:
cmp r1, #0
beq .L4
mov r3, r0
add r1, r0, r1, asl #2
mov r0, #0
.L3:
ldr r2, [r3], #4
cmp r3, r1
add r0, r0, r2
bne .L3
bx lr
.L4:
mov r0, r1
bx lr
The same could be done on SH, too (comparing against the end address instead of
using a loop counter), but it would add a loop setup overhead. In the optimal
case the above function would result in the following SH code:
tst r5,r5
bt/s .L6
mov #0,r0
.L3:
mov.l @r4+,r1
dt r5
bf/s .L3
add r1,r0
.L6:
rts
nop
This problem is present on rev 193061 as well as on the 4.7 branch.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/55162] Loop ivopts cuts off top bits of loop counter
2012-11-01 10:08 [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter olegendo at gcc dot gnu.org
@ 2012-11-01 10:12 ` olegendo at gcc dot gnu.org
2012-11-02 4:08 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-01 10:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-01 10:11:46 UTC ---
(In reply to comment #0)
> The same could be done on SH, too (comparing against the end address instead of
> using a loop counter), but it would add a loop setup overhead. In the optimal
> case the above function would result in the following SH code:
>
> tst r5,r5
> bt/s .L6
> mov #0,r0
> .L3:
> mov.l @r4+,r1
> dt r5
> bf/s .L3
> add r1,r0
> .L6:
> rts
> nop
>
... which is the case if '*x++' is used instead of 'x[i]':
int test (int* x, unsigned int c)
{
int s = 0;
unsigned int i;
for (i = 0; i < c; ++i)
s += *x++;
return s;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/55162] Loop ivopts cuts off top bits of loop counter
2012-11-01 10:08 [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter olegendo at gcc dot gnu.org
2012-11-01 10:12 ` [Bug tree-optimization/55162] " olegendo at gcc dot gnu.org
@ 2012-11-02 4:08 ` pinskia at gcc dot gnu.org
2012-11-02 10:07 ` olegendo at gcc dot gnu.org
2012-11-03 12:19 ` olegendo at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-11-02 4:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-11-02 04:08:25 UTC ---
>For example, passing a virtual address 0x00001000 and c = 0x80000000 to the
function should actually run over the address range 0x00001000 .. 0x80001000,
No it runs over the address range 0x00001000 .. -1 and more as 0x80000000 * 4
wraps/overflows. If x was char* then I would say there is a bug but this is
int* which has a size of 4.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/55162] Loop ivopts cuts off top bits of loop counter
2012-11-01 10:08 [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter olegendo at gcc dot gnu.org
2012-11-01 10:12 ` [Bug tree-optimization/55162] " olegendo at gcc dot gnu.org
2012-11-02 4:08 ` pinskia at gcc dot gnu.org
@ 2012-11-02 10:07 ` olegendo at gcc dot gnu.org
2012-11-03 12:19 ` olegendo at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-02 10:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
Oleg Endo <olegendo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu.org
--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-02 10:07:26 UTC ---
(In reply to comment #2)
> >For example, passing a virtual address 0x00001000 and c = 0x80000000 to the
> function should actually run over the address range 0x00001000 .. 0x80001000,
>
>
> No it runs over the address range 0x00001000 .. -1 and more as 0x80000000 * 4
> wraps/overflows. If x was char* then I would say there is a bug but this is
> int* which has a size of 4.
Ugh, sorry for the mess up. Of course you're right.
I guess that the pointer wrap-around would fall into "undefined behavior"
category. If so, then the loop counter adjustment could be left out entirely,
couldn't it?
My point is that the loop counter adjustment can become quite bloaty on SH:
struct X
{
int a, b, c, d, e;
};
int test (X* x, unsigned int c)
{
int s = 0;
unsigned int i;
for (i = 0; i < c; ++i)
s += x[i].b;
return s;
}
results in:
tst r5,r5
bt/s .L4
mov r5,r1
shll2 r1
add r5,r1
mov.l .L9,r2
shll2 r1
add #-20,r1
shlr2 r1
mul.l r2,r1
mov.l .L10,r2
add #4,r4
mov #0,r0
sts macl,r1
and r2,r1
add #1,r1
.L3:
mov.l @r4,r2
dt r1
add #20,r4
bf/s .L3
add r2,r0
rts
nop
.L4:
rts
mov #0,r0
.L11:
.align 2
.L9:
.long 214748365
.L10:
.long 1073741823
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/55162] Loop ivopts cuts off top bits of loop counter
2012-11-01 10:08 [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter olegendo at gcc dot gnu.org
` (2 preceding siblings ...)
2012-11-02 10:07 ` olegendo at gcc dot gnu.org
@ 2012-11-03 12:19 ` olegendo at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-03 12:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-03 12:19:28 UTC ---
(In reply to comment #3)
I've created a new PR 55190 for this.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-11-03 12:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-01 10:08 [Bug tree-optimization/55162] New: Loop ivopts cuts off top bits of loop counter olegendo at gcc dot gnu.org
2012-11-01 10:12 ` [Bug tree-optimization/55162] " olegendo at gcc dot gnu.org
2012-11-02 4:08 ` pinskia at gcc dot gnu.org
2012-11-02 10:07 ` olegendo at gcc dot gnu.org
2012-11-03 12:19 ` olegendo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).