From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4521 invoked by alias); 17 Jun 2013 00:13:49 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 4479 invoked by uid 48); 17 Jun 2013 00:13:44 -0000 From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/55190] [SH] ivopts causes loop setup bloat Date: Mon, 17 Jun 2013 00:13:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 4.8.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-06/txt/msg00888.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 Oleg Endo changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |rtl-optimization --- Comment #2 from Oleg Endo --- Looking at a simpler case (with -O2) .... int test_0 (int* x, int y) { int sum = 0; for (int i = 0; i < y; ++i) sum += x[i]; return sum; } cmp/pl r5 bf/s .L6 mov #0,r0 shll2 r5 add #-4,r5 shlr2 r5 add #1,r5 // r5 = ((((unsigned int)y << 2) - 4) >> 2) + 1 .align 2 .L3: mov.l @r4+,r1 dt r5 bf/s .L3 add r1,r0 .L6: rts nop In this case, if 'y' initially has the value '0x7FFFFFFF' the resulting loop count is truncated to '0x3FFFFFFF'. This is sort of OK, since the resulting address would overflow and that is undefined behavior. On the other hand, if an unlucky address for 'x' is passed in the first place, the resulting address might overflow much earlier than that. Thus the loop counter fiddling seems pointless. The tree-ssa-ivopts pass converts the loop to this: Replacing exit test: if (y_3(D) > i_11) int test_0(int*, int) (int * x, int y) { unsigned int ivtmp.6; int i; int sum; unsigned int i.0; unsigned int _1; void * _2; int _9; unsigned int _19; unsigned int _20; unsigned int _21; : if (y_3(D) > 0) goto ; else goto ; : ivtmp.6_12 = (unsigned int) x_6(D); _1 = (unsigned int) y_3(D); _21 = _1 * 4; _20 = (unsigned int) x_6(D); _19 = _20 + _21; : # sum_16 = PHI # ivtmp.6_15 = PHI _2 = (void *) ivtmp.6_15; _9 = MEM[base: _2, offset: 0B]; sum_10 = _9 + sum_16; ivtmp.6_14 = ivtmp.6_15 + 4; if (ivtmp.6_14 != _19) goto ; else goto ; : # sum_18 = PHI goto ; : goto ; : # sum_13 = PHI return sum_13; } ... which uses address '(x + y * 4)' as the loop exit test. It is expanded to RTL as '(x + (y << 2))' ;; Generating RTL for gimple basic block 3 ;; ivtmp.6_12 = (unsigned int) x_6(D); (insn 38 37 0 (set (reg:SI 190 [ ivtmp.6 ]) (reg/v/f:SI 194 [ x ])) -1 (nil)) ;; _19 = ivtmp.6_12 + _21; (insn 39 38 40 (set (reg:SI 196 [ D.1617 ]) (ashift:SI (reg/v:SI 195 [ y ]) (const_int 2 [0x2]))) -1 (nil)) (insn 40 39 0 (set (reg:SI 191 [ D.1617 ]) (plus:SI (reg:SI 190 [ ivtmp.6 ]) (reg:SI 196 [ D.1617 ]))) -1 (nil)) ... and remains until the loop2_doloop RTL pass, which converts the whole thing into a decrement-and-test loop and adds the other loop counter modifications: Analyzing operand (reg:SI 191 [ D.1617 ]) of insn (insn 45 44 46 4 (set (reg:SI 147 t) (eq:SI (reg:SI 190 [ ivtmp.6 ]) (reg:SI 191 [ D.1617 ]))) sh_tmp.cpp:5 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 191 [ D.1617 ]) (nil))) invariant (reg:SI 191 [ D.1617 ]) (in SI) ;; improved upper bound by one. ;; Determined upper bound -2. Loop 1 is simple: simple exit 4 -> 7 number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 191 [ D.1617 ]) (reg:SI 190 [ ivtmp.6 ])) (const_int -4 [0xfffffffffffffffc])) (const_int 2 [0x2])) upper bound: 1073741822 realistic bound: -1 The code in loop-iv.c works out the correct loop count if it gets the actual loop count upper bound instead of the truncated address upper bound if the tree-ssa-ivopts pass is turned off via -fno-ivopts. I have also tried out the same code on ARM: cmp r1, #0 ble .L4 mov r3, r0 add r1, r0, r1, asl #2 mov r0, #0 .L3: ldr r2, [r3], #4 cmp r3, r1 add r0, r0, r2 bne .L3 bx lr .L4: mov r0, #0 bx lr Since there is no doloop pattern on ARM, the code is left as it was output by the tree-ssa-ivopts pass, i.e. the exit test uses 'x + (y << 2)'. So this doesn't seem to be a SH only issue. However, I'm not sure whether this is more a problem of tree-ssa-ivopts or loop-iv. If the tree-ssa-ivopts pass left the loop counter alone, the doloop pass would pick it up and the upper bound calculations in this case would go away. However, targets that can do better without doloop (such as ARM) would probably suffer. So probably it would be better to handle this overflow case in loop-iv.