[Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop
@ 2014-01-30 21:41 paulo@matos-sorge.com
  2014-01-31 10:08 ` [Bug tree-optimization/59999] [4.9 Regression] " rguenth at gcc dot gnu.org
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-01-30 21:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

            Bug ID: 59999
           Summary: Sign extension in loop regression blocks generation of
                    zero overhead loop
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: paulo@matos-sorge.com

Following the discussion in the mailing list thread:
http://gcc.gnu.org/ml/gcc/2014-01/msg00319.html

I removed the undefined behaviour mentioned by Andreas. This code:

extern short delayLength;
typedef int Sample;
extern Sample *temp_ptr;
extern Sample x;

void
foo (short blockSize)
{
  short i;
  unsigned short loopCount;

  loopCount = (unsigned short)(blockSize + delayLength) % 8;

  for (i = 0; (int)i < (int)loopCount; i++)
    {
      *temp_ptr = x ^ *temp_ptr;
      temp_ptr++;
    }
}

displays the same regression.
v850 on trunk with -O2 -mv850e3v5:
_foo:
.LFB0:
        mov hilo(_delayLength),r10
        ld.h 0[r10],r10
        add r10,r6
        andi 7,r6,r6
        be .L1
        mov hilo(_temp_ptr),r15
        mov 0,r10
        ld.w 0[r15],r11
        mov hilo(_x),r14
.L4:
        ld.w 0[r11],r13
        ld.w 0[r14],r12
        add 1,r10
        add 4,r11
        xor r13,r12
        sxh r10
        st.w r12,-4[r11]
        cmp r6,r10
        blt .L4
        st.w r11,0[r15]
.L1:
        jmp [r31]
.LFE0:
        .size   _foo, .-_foo
        .section        .debug_frame,"",@progbits


GCC until 
commit e0ae2fe2a0bebe9de31e3d8eb4feace4909ef009
Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri May 20 19:32:30 2011 +0000

    2011-05-20  Tom de Vries  <tom@codesourcery.com>

        PR target/45098
        * tree-ssa-loop-ivopts.c: Include expmed.h.
        (get_shiftadd_cost): New function.
        (force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173976
138bc75d-0d04-0410-961f-82ee72b054a4

could do better by avoiding the sign extend inside the loop.
At the time it was not such of a problem. Nowadays we support zero overhead
loop for v850 and it is not being generated because of the sign extend. Similar
situation for the mep backend.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug tree-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
@ 2014-01-31 10:08 ` rguenth at gcc dot gnu.org
  2014-01-31 10:14 ` paulo@matos-sorge.com
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-31 10:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.9.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I guess pure co-incidence


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug tree-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
  2014-01-31 10:08 ` [Bug tree-optimization/59999] [4.9 Regression] " rguenth at gcc dot gnu.org
@ 2014-01-31 10:14 ` paulo@matos-sorge.com
  2014-01-31 11:41 ` [Bug rtl-optimization/59999] " rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-01-31 10:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #2 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Richard Biener from comment #1)
> I guess pure co-incidence

If I understand you correctly you think that the patch I mentioned is not the
culprit but simply triggered this to happen. It might be true indeed. The patch
was obtained with a simple bisect from the git repository using the v850 as
testing backend since the mep backend is much more recent so I couldn't really
test it.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
  2014-01-31 10:08 ` [Bug tree-optimization/59999] [4.9 Regression] " rguenth at gcc dot gnu.org
  2014-01-31 10:14 ` paulo@matos-sorge.com
@ 2014-01-31 11:41 ` rguenth at gcc dot gnu.org
  2014-01-31 12:04 ` paulo@matos-sorge.com
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-31 11:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|tree-optimization           |rtl-optimization

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, I think that the IV choice merely shows that we miss to optimize the
extension - which would be somewhere in the RTL opt pipeline.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (2 preceding siblings ...)
  2014-01-31 11:41 ` [Bug rtl-optimization/59999] " rguenth at gcc dot gnu.org
@ 2014-01-31 12:04 ` paulo@matos-sorge.com
  2014-01-31 14:52 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-01-31 12:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #4 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Richard Biener from comment #3)
> Yes, I think that the IV choice merely shows that we miss to optimize the
> extension - which would be somewhere in the RTL opt pipeline.

Makes sense. My first instinct was to do it in expand but since expand does one
gimple statement at a time it might be too much for it to handle since it
probably has to detect the sign extend and promote the type of the register if
there are no conflicting conditions. 

If you suggest where to do this kind of thing I can give it a try.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (3 preceding siblings ...)
  2014-01-31 12:04 ` paulo@matos-sorge.com
@ 2014-01-31 14:52 ` rguenth at gcc dot gnu.org
  2014-01-31 15:09 ` paulo@matos-sorge.com
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-31 14:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Apart from expand there is the redundant-extension-elimination, ree.c.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (4 preceding siblings ...)
  2014-01-31 14:52 ` rguenth at gcc dot gnu.org
@ 2014-01-31 15:09 ` paulo@matos-sorge.com
  2014-02-05 11:03 ` paulo@matos-sorge.com
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-01-31 15:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #6 from Paulo J. Matos <paulo@matos-sorge.com> ---
humm, ree is no good because by then we missed already the generation of zero
overhead loops. Do you think this is something that could be added to expand?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (5 preceding siblings ...)
  2014-01-31 15:09 ` paulo@matos-sorge.com
@ 2014-02-05 11:03 ` paulo@matos-sorge.com
  2014-02-05 12:14 ` paulo@matos-sorge.com
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-05 11:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #7 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Richard Biener from comment #5)
> Apart from expand there is the redundant-extension-elimination, ree.c.

In expand we get the following gimple for the loop:
;;   basic block 4, loop depth 0
;;    pred:       2
;;                4
  # i_15 = PHI <0(2), i_12(4)>
  # _18 = PHI <0(2), _4(4)>
  _6 = arr[_18];
  _7 = _6 + 1;
  arr[_18] = _7;
  _17 = (unsigned short) i_15;
  _13 = _17 + 1;
  i_12 = (short int) _13;
  _4 = (int) i_12;
  if (_4 < limit_5(D))
    goto <bb 4>;
  else
    goto <bb 3>;
;;    succ:       4
;;                3


Where _13 is an unsigned short and what we want to eliminate is this sign
extend:
  _4 = (int) i_12;

This doesn't seem trivial in the expand phase because to eliminate the sign
expand, you promote i_12 to int and have then to promote a bunch of other
variables, whose insn have been already emitted when you get here. Shouldn't
this be ivopts noticing that if it generates an int IV, it saves a sign extend
and therefore is better?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (6 preceding siblings ...)
  2014-02-05 11:03 ` paulo@matos-sorge.com
@ 2014-02-05 12:14 ` paulo@matos-sorge.com
  2014-02-05 12:15 ` paulo@matos-sorge.com
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-05 12:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #8 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Paulo J. Matos from comment #7)
> (In reply to Richard Biener from comment #5)
> > Apart from expand there is the redundant-extension-elimination, ree.c.
> 
> In expand we get the following gimple for the loop:
> ;;   basic block 4, loop depth 0
> ;;    pred:       2
> ;;                4
>   # i_15 = PHI <0(2), i_12(4)>
>   # _18 = PHI <0(2), _4(4)>
>   _6 = arr[_18];
>   _7 = _6 + 1;
>   arr[_18] = _7;
>   _17 = (unsigned short) i_15;
>   _13 = _17 + 1;
>   i_12 = (short int) _13;
>   _4 = (int) i_12;
>   if (_4 < limit_5(D))
>     goto <bb 4>;
>   else
>     goto <bb 3>;
> ;;    succ:       4
> ;;                3
> 
> 
> Where _13 is an unsigned short and what we want to eliminate is this sign
> extend:
>   _4 = (int) i_12;
> 
> This doesn't seem trivial in the expand phase because to eliminate the sign
> expand, you promote i_12 to int and have then to promote a bunch of other
> variables, whose insn have been already emitted when you get here. Shouldn't
> this be ivopts noticing that if it generates an int IV, it saves a sign
> extend and therefore is better?

Made a mistake. With the attached test, the final gimple before expand for the
loop basic block is:
;;   basic block 5, loop depth 0
;;    pred:       5
;;                4
  # i_26 = PHI <i_1(5), 0(4)>
  # ivtmp.24_18 = PHI <ivtmp.24_12(5), ivtmp.24_29(4)>
  _28 = (void *) ivtmp.24_18;
  _13 = MEM[base: _28, offset: 0B];
  x.4_14 = x;
  _15 = _13 ^ x.4_14;
  MEM[base: _28, offset: 0B] = _15;
  ivtmp.24_12 = ivtmp.24_18 + 4;
  temp_ptr.5_17 = (Sample *) ivtmp.24_12;
  _11 = (unsigned short) i_26;
  _2 = _11 + 1;
  i_1 = (short int) _2;
  _10 = (int) i_1;
  if (_10 < _25)
    goto <bb 5>;
  else
    goto <bb 6>;
;;    succ:       5
;;                6

However, the point is the same. IVOPTS should probably generate an int IV
instead of a short int IV to avoid the sign extend since removing the sign
extend during RTL seems to be quite hard.

What do you think?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (7 preceding siblings ...)
  2014-02-05 12:14 ` paulo@matos-sorge.com
@ 2014-02-05 12:15 ` paulo@matos-sorge.com
  2014-02-05 15:37 ` paulo@matos-sorge.com
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-05 12:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #9 from Paulo J. Matos <paulo@matos-sorge.com> ---
Created attachment 32044
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32044&action=edit
Testcase


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (8 preceding siblings ...)
  2014-02-05 12:15 ` paulo@matos-sorge.com
@ 2014-02-05 15:37 ` paulo@matos-sorge.com
  2014-02-05 17:09 ` paulo@matos-sorge.com
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-05 15:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #10 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Paulo J. Matos from comment #8)
> 
> Made a mistake. With the attached test, the final gimple before expand for
> the loop basic block is:
> ;;   basic block 5, loop depth 0
> ;;    pred:       5
> ;;                4
>   # i_26 = PHI <i_1(5), 0(4)>
>   # ivtmp.24_18 = PHI <ivtmp.24_12(5), ivtmp.24_29(4)>
>   _28 = (void *) ivtmp.24_18;
>   _13 = MEM[base: _28, offset: 0B];
>   x.4_14 = x;
>   _15 = _13 ^ x.4_14;
>   MEM[base: _28, offset: 0B] = _15;
>   ivtmp.24_12 = ivtmp.24_18 + 4;
>   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
>   _11 = (unsigned short) i_26;
>   _2 = _11 + 1;
>   i_1 = (short int) _2;
>   _10 = (int) i_1;
>   if (_10 < _25)
>     goto <bb 5>;
>   else
>     goto <bb 6>;
> ;;    succ:       5
> ;;                6
> 
> However, the point is the same. IVOPTS should probably generate an int IV
> instead of a short int IV to avoid the sign extend since removing the sign
> extend during RTL seems to be quite hard.
> 
> What do you think?

For >= 4.8 the scalar evolution of _10 is deemed not simple, because it looks
like the following:
 <nop_expr 0x2aaaaacd9ee0
    type <integer_type 0x2aaaaab16690 int public SI
        size <integer_cst 0x2aaaaab12c60 constant 32>
        unit size <integer_cst 0x2aaaaab12c80 constant 4>
        align 32 symtab 0 alias set 3 canonical type 0x2aaaaab16690 precision
32 min <integer_cst 0x2aaaaab12f80 -2147483648> max <integer_cst 0x2aaaaab12fa0
2147483647> context <translation_unit_decl 0x2aaaaab29c00 D.2881>
        pointer_to_this <pointer_type 0x2aaaaab23348>>

    arg 0 <polynomial_chrec 0x2aaaaacdb090
        type <integer_type 0x2aaaaab16540 short int sizes-gimplified public HI
            size <integer_cst 0x2aaaaab12f20 constant 16>
            unit size <integer_cst 0x2aaaaab12f40 constant 2>
            align 16 symtab 0 alias set 4 canonical type 0x2aaaaab16540
precision 16 min <integer_cst 0x2aaaaab12ec0 -32768> max <integer_cst
0x2aaaaab12ee0 32767>
            pointer_to_this <pointer_type 0x2aaaaaca1f18>>

        arg 0 <integer_cst 0x2aaaaab1f260 constant 1>
        arg 1 <integer_cst 0x2aaaaacc9140 constant 1> arg 2 <integer_cst
0x2aaaaacc9140 1>>>

This is something like: (int) (short int) {1, +, 1}_1. Since these are signed
integers, we can assume they don't overflow, can't we simplify the scalar
evolution to a polynomial_chrec over 32bit integers and forget the nop_expr
that represents the sign extend?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (9 preceding siblings ...)
  2014-02-05 15:37 ` paulo@matos-sorge.com
@ 2014-02-05 17:09 ` paulo@matos-sorge.com
  2014-02-06 10:28 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-05 17:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #11 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;    pred:       5
> > ;;                4
> >   # i_26 = PHI <i_1(5), 0(4)>
> >   # ivtmp.24_18 = PHI <ivtmp.24_12(5), ivtmp.24_29(4)>
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> >     goto <bb 5>;
> >   else
> >     goto <bb 6>;
> > ;;    succ:       5
> > ;;                6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>  <nop_expr 0x2aaaaacd9ee0
>     type <integer_type 0x2aaaaab16690 int public SI
>         size <integer_cst 0x2aaaaab12c60 constant 32>
>         unit size <integer_cst 0x2aaaaab12c80 constant 4>
>         align 32 symtab 0 alias set 3 canonical type 0x2aaaaab16690
> precision 32 min <integer_cst 0x2aaaaab12f80 -2147483648> max <integer_cst
> 0x2aaaaab12fa0 2147483647> context <translation_unit_decl 0x2aaaaab29c00
> D.2881>
>         pointer_to_this <pointer_type 0x2aaaaab23348>>
>    
>     arg 0 <polynomial_chrec 0x2aaaaacdb090
>         type <integer_type 0x2aaaaab16540 short int sizes-gimplified public
> HI
>             size <integer_cst 0x2aaaaab12f20 constant 16>
>             unit size <integer_cst 0x2aaaaab12f40 constant 2>
>             align 16 symtab 0 alias set 4 canonical type 0x2aaaaab16540
> precision 16 min <integer_cst 0x2aaaaab12ec0 -32768> max <integer_cst
> 0x2aaaaab12ee0 32767>
>             pointer_to_this <pointer_type 0x2aaaaaca1f18>>
>        
>         arg 0 <integer_cst 0x2aaaaab1f260 constant 1>
>         arg 1 <integer_cst 0x2aaaaacc9140 constant 1> arg 2 <integer_cst
> 0x2aaaaacc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

This chain of nop_expr in the scalar evolution is due to Richards fix for
PR53676. It is still not clear to me, what the fix is for and if it needs
tweaking or if it needs for a later pass to remove the widening from the loop.
I am investigating.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (10 preceding siblings ...)
  2014-02-05 17:09 ` paulo@matos-sorge.com
@ 2014-02-06 10:28 ` rguenth at gcc dot gnu.org
  2014-02-06 11:16 ` paulo@matos-sorge.com
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-06 10:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;    pred:       5
> > ;;                4
> >   # i_26 = PHI <i_1(5), 0(4)>
> >   # ivtmp.24_18 = PHI <ivtmp.24_12(5), ivtmp.24_29(4)>
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> >     goto <bb 5>;
> >   else
> >     goto <bb 6>;
> > ;;    succ:       5
> > ;;                6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>  <nop_expr 0x2aaaaacd9ee0
>     type <integer_type 0x2aaaaab16690 int public SI
>         size <integer_cst 0x2aaaaab12c60 constant 32>
>         unit size <integer_cst 0x2aaaaab12c80 constant 4>
>         align 32 symtab 0 alias set 3 canonical type 0x2aaaaab16690
> precision 32 min <integer_cst 0x2aaaaab12f80 -2147483648> max <integer_cst
> 0x2aaaaab12fa0 2147483647> context <translation_unit_decl 0x2aaaaab29c00
> D.2881>
>         pointer_to_this <pointer_type 0x2aaaaab23348>>
>    
>     arg 0 <polynomial_chrec 0x2aaaaacdb090
>         type <integer_type 0x2aaaaab16540 short int sizes-gimplified public
> HI
>             size <integer_cst 0x2aaaaab12f20 constant 16>
>             unit size <integer_cst 0x2aaaaab12f40 constant 2>
>             align 16 symtab 0 alias set 4 canonical type 0x2aaaaab16540
> precision 16 min <integer_cst 0x2aaaaab12ec0 -32768> max <integer_cst
> 0x2aaaaab12ee0 32767>
>             pointer_to_this <pointer_type 0x2aaaaaca1f18>>
>        
>         arg 0 <integer_cst 0x2aaaaab1f260 constant 1>
>         arg 1 <integer_cst 0x2aaaaacc9140 constant 1> arg 2 <integer_cst
> 0x2aaaaacc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
i++ is really i = (short)((int) i + 1) and thus only the operation in
type 'int' is known to not overflow and thus the IV in short _can_
overflow and the loop can loop infinitely for example for loopCount
== SHORT_MAX + 1.

The fix to SCEV analysis was to still be able to analyze the evolution at all.

The testcase is simply very badly written (unsigned short upper bound,
signed short IV and IV comparison against upper bound in signed int).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (11 preceding siblings ...)
  2014-02-06 10:28 ` rguenth at gcc dot gnu.org
@ 2014-02-06 11:16 ` paulo@matos-sorge.com
  2014-02-06 11:25 ` paulo@matos-sorge.com
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-06 11:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #13 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to Richard Biener from comment #12)
> 
> Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
> i++ is really i = (short)((int) i + 1) and thus only the operation in
> type 'int' is known to not overflow and thus the IV in short _can_
> overflow and the loop can loop infinitely for example for loopCount
> == SHORT_MAX + 1.
> 
> The fix to SCEV analysis was to still be able to analyze the evolution at
> all.
> 
> The testcase is simply very badly written (unsigned short upper bound,
> signed short IV and IV comparison against upper bound in signed int).

I thought any signed operation cannot overflow, independently on its width,
therefore (short) (int + 1) shouldn't overflow.

I agree with you on the testcase, however, that's taken from customer code and
it's even if badly written, it's acceptable C. GCC 4.5.4 generates the scalar
evolution for the integer variable: {1, +, 1}_1 without the casts (therefore a
simple_iv). This allows GCC to use an int for an IV which helps discard the
sign extend in the loop body and later on allows the zero overhead loop being
generated. This case happens again and again and causes serious performance
regression on customer code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (12 preceding siblings ...)
  2014-02-06 11:16 ` paulo@matos-sorge.com
@ 2014-02-06 11:25 ` paulo@matos-sorge.com
  2014-02-06 12:05 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-06 11:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #14 from Paulo J. Matos <paulo@matos-sorge.com> ---
Something like this which looks much simpler hits the same problem:
extern int arr[];

void
foo32 (int limit)
{
  short i;
  for (i = 0; (int)i < limit; i++)
    arr[i] += 1;
}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (13 preceding siblings ...)
  2014-02-06 11:25 ` paulo@matos-sorge.com
@ 2014-02-06 12:05 ` rguenther at suse dot de
  2014-02-06 12:40 ` paulo@matos-sorge.com
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2014-02-06 12:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 6 Feb 2014, paulo@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> 
> --- Comment #14 from Paulo J. Matos <paulo@matos-sorge.com> ---
> Something like this which looks much simpler hits the same problem:
> extern int arr[];
> 
> void
> foo32 (int limit)
> {
>   short i;
>   for (i = 0; (int)i < limit; i++)
>     arr[i] += 1;
> }

Exactly the same problem.  C integral type promotion rules make
that i = (short)((int)i + 1) again.  Note that (int)i + 1
does not overflow, (short) ((int)i + 1) invokes implementation-defined
behavior which in our case is modulo-2 reduction.

Nothing guarantees that (short)i + 1 does not overflow.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (14 preceding siblings ...)
  2014-02-06 12:05 ` rguenther at suse dot de
@ 2014-02-06 12:40 ` paulo@matos-sorge.com
  2014-02-06 13:04 ` paulo@matos-sorge.com
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-06 12:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #16 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to rguenther@suse.de from comment #15)
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens.
It seems to be i = (short) ((unsigned short) i + 1)
Later i is cast to int for comparison.

Before ivopts this is the end of the loop body:
  i.7_19 = (unsigned short) i_26;
  _20 = i.7_19 + 1;
  i_21 = (short intD.8) _20;
  _10 = (intD.1) i_21;
  if (_10 < _25)
    goto <bb 7>;
  else
    goto <bb 6>;

i is initially a short, then moved to unsigned short. The addition is performed
and returned to short. Then cast to int for the comparison.

For GCC 4.5.4 the end of loop body is:
  iD.2767_18 = iD.2767_26 + 1;
  D.5046_9 = (intD.0) iD.2767_18;
  if (D.5046_9 < D.5047_25)
    goto <bb 5>;
  else
    goto <bb 6>;

Here the addition is made in short int and then there's only one cast to int.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (15 preceding siblings ...)
  2014-02-06 12:40 ` paulo@matos-sorge.com
@ 2014-02-06 13:04 ` paulo@matos-sorge.com
  2014-02-06 13:17 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-06 13:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #17 from Paulo J. Matos <paulo@matos-sorge.com> ---
(In reply to rguenther@suse.de from comment #15)
> On Thu, 6 Feb 2014, paulo@matos-sorge.com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> > 
> > --- Comment #14 from Paulo J. Matos <paulo@matos-sorge.com> ---
> > Something like this which looks much simpler hits the same problem:
> > extern int arr[];
> > 
> > void
> > foo32 (int limit)
> > {
> >   short i;
> >   for (i = 0; (int)i < limit; i++)
> >     arr[i] += 1;
> > }
> 
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

I am being thick... indeed I forgot to notice that i++ also invokes undefined
behaviour. I guess then GCC sorts that out by casting i into unsigned short for
the addition and all the remaining issues then unfold.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (16 preceding siblings ...)
  2014-02-06 13:04 ` paulo@matos-sorge.com
@ 2014-02-06 13:17 ` rguenther at suse dot de
  2014-02-06 13:20 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2014-02-06 13:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 6 Feb 2014, paulo@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> 
> --- Comment #17 from Paulo J. Matos <paulo@matos-sorge.com> ---
> (In reply to rguenther@suse.de from comment #15)
> > On Thu, 6 Feb 2014, paulo@matos-sorge.com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> > > 
> > > --- Comment #14 from Paulo J. Matos <paulo@matos-sorge.com> ---
> > > Something like this which looks much simpler hits the same problem:
> > > extern int arr[];
> > > 
> > > void
> > > foo32 (int limit)
> > > {
> > >   short i;
> > >   for (i = 0; (int)i < limit; i++)
> > >     arr[i] += 1;
> > > }
> > 
> > Exactly the same problem.  C integral type promotion rules make
> > that i = (short)((int)i + 1) again.  Note that (int)i + 1
> > does not overflow, (short) ((int)i + 1) invokes implementation-defined
> > behavior which in our case is modulo-2 reduction.
> > 
> > Nothing guarantees that (short)i + 1 does not overflow.
> 
> I am being thick... indeed I forgot to notice that i++ also invokes undefined
> behaviour. I guess then GCC sorts that out by casting i into unsigned short for
> the addition and all the remaining issues then unfold.

No, i++ doesn't invoke undefined behavior - that's the whole point
and GCC got this wrong until it was fixed (4.5 is still broken).
The whole point is that limit == SHORT_MAX + 1 and the loop being
endless is _valid_ (well, apart from arr[i] then overflowing - looks
like an opportunity to derive that i can _not_ overflow ... ;))


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (17 preceding siblings ...)
  2014-02-06 13:17 ` rguenther at suse dot de
@ 2014-02-06 13:20 ` rguenther at suse dot de
  2014-02-07 10:08 ` paulo@matos-sorge.com
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2014-02-06 13:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 6 Feb 2014, paulo@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> 
> --- Comment #16 from Paulo J. Matos <paulo@matos-sorge.com> ---
> (In reply to rguenther@suse.de from comment #15)
> > Exactly the same problem.  C integral type promotion rules make
> > that i = (short)((int)i + 1) again.  Note that (int)i + 1
> > does not overflow, (short) ((int)i + 1) invokes implementation-defined
> > behavior which in our case is modulo-2 reduction.
> > 
> > Nothing guarantees that (short)i + 1 does not overflow.
> 
> OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens.
> It seems to be i = (short) ((unsigned short) i + 1)
> Later i is cast to int for comparison.
> 
> Before ivopts this is the end of the loop body:
>   i.7_19 = (unsigned short) i_26;
>   _20 = i.7_19 + 1;
>   i_21 = (short intD.8) _20;
>   _10 = (intD.1) i_21;
>   if (_10 < _25)
>     goto <bb 7>;
>   else
>     goto <bb 6>;
> 
> i is initially a short, then moved to unsigned short. The addition is performed
> and returned to short. Then cast to int for the comparison.
> 
> For GCC 4.5.4 the end of loop body is:
>   iD.2767_18 = iD.2767_26 + 1;
>   D.5046_9 = (intD.0) iD.2767_18;
>   if (D.5046_9 < D.5047_25)
>     goto <bb 5>;
>   else
>     goto <bb 6>;
> 
> Here the addition is made in short int and then there's only one cast to int.

Yes, and thus GCC 4.5 still contains the bug that i++ invokes undefined
behavior when overflowing (which it does not).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (18 preceding siblings ...)
  2014-02-06 13:20 ` rguenther at suse dot de
@ 2014-02-07 10:08 ` paulo@matos-sorge.com
  2014-02-07 11:01 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-07 10:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #20 from Paulo J. Matos <paulo@matos-sorge.com> ---
OK, I was trying to make sense of all this and there are two things that stick
out.

One is when you say that due to C integer promotion rules make i =
(short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) i + 1).
Am I missing something that allows this or makes the addition in int equivalent
to the addition in unsigned short?

Secondly we still have a dangling sign_extend later on that we could possibly
optimize. I find it hard to understand if this can be done properly in expand
or if a small pass like ree but before zero overhead loop generation is better.
What do you think?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (19 preceding siblings ...)
  2014-02-07 10:08 ` paulo@matos-sorge.com
@ 2014-02-07 11:01 ` rguenther at suse dot de
  2014-02-12 13:03 ` paulo@matos-sorge.com
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2014-02-07 11:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 7 Feb 2014, paulo@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> 
> --- Comment #20 from Paulo J. Matos <paulo@matos-sorge.com> ---
> OK, I was trying to make sense of all this and there are two things that stick
> out.
> 
> One is when you say that due to C integer promotion rules make i = 
> (short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) 
> i + 1). Am I missing something that allows this or makes the addition in 
> int equivalent to the addition in unsigned short?

This is a valid shortening optimization GCC performs.

> Secondly we still have a dangling sign_extend later on that we could 
> possibly optimize. I find it hard to understand if this can be done 
> properly in expand or if a small pass like ree but before zero overhead 
> loop generation is better. What do you think?

That entirely depends on where the extension is generated and what
information is present there ... if it can be avoided at expand
time then that's surely the best thing to do.  Maybe it can even
be avoided on the GIMPLE level.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (20 preceding siblings ...)
  2014-02-07 11:01 ` rguenther at suse dot de
@ 2014-02-12 13:03 ` paulo@matos-sorge.com
  2014-02-12 13:30 ` rguenther at suse dot de
  2014-02-12 13:39 ` pmatos at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: paulo@matos-sorge.com @ 2014-02-12 13:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #22 from Paulo J. Matos <paulo@matos-sorge.com> ---
After some thought, I am concluding this cannot actually be optimized and that
GCC 4.5.4 was better because it was taking advantage of an undefined behaviour
that doesn't exist.

The thought process is as follows. The whole process has to do with this type
of loop:
void foo (int loopCount)
{
  short i;
  for (i = 0; (int)i < loopCount; i++)
    ...
}

GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was
done in type short. Then i was promoted to int through a sign_extend and
compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an
int scev for the loop.

In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have
undefined behaviour. i++ due to C integer promotion rules is: i = (short)
((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1).
This is then sign extended to int for comparison. GCC cannot generate an int
scev because it's not simple: (int) (short) {1, +, 1}_1.

This can validly loop forever if loopCount > SHORT_MAX.
For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and
is incremented by one the addition is fine because is done in (unsigned short)
and then truncated using modulo 2 (implementation defined behaviour) to short,
therefore never reaching loopCount and looping forever.

In RTL we get the following sequence:
r4:SI <- [loopCount]
r0:HI <- 0

code label...

...

r2:HI <- r1:HI + 1
r3:SI <- sign_extend r2:HI

p0:BI <- r3:SI < r4:SI
loop to code label if p0:BI

I was tempted to simplify this to:
r4:SI <- [loopCount]
r0:SI <- 0

code label...

...

r2:SI <- r1:SI + 1

p0:BI <- r2:SI < r4:SI
loop to code label if p0:BI

However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX,
therefore I think that at least in this case this cannot be optimized.

I am tempted to close the bug report. Richard?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (21 preceding siblings ...)
  2014-02-12 13:03 ` paulo@matos-sorge.com
@ 2014-02-12 13:30 ` rguenther at suse dot de
  2014-02-12 13:39 ` pmatos at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2014-02-12 13:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 12 Feb 2014, paulo@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999
> 
> --- Comment #22 from Paulo J. Matos <paulo@matos-sorge.com> ---
> After some thought, I am concluding this cannot actually be optimized and that
> GCC 4.5.4 was better because it was taking advantage of an undefined behaviour
> that doesn't exist.
> 
> The thought process is as follows. The whole process has to do with this type
> of loop:
> void foo (int loopCount)
> {
>   short i;
>   for (i = 0; (int)i < loopCount; i++)
>     ...
> }
> 
> GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was
> done in type short. Then i was promoted to int through a sign_extend and
> compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an
> int scev for the loop.
> 
> In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have
> undefined behaviour. i++ due to C integer promotion rules is: i = (short)
> ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1).
> This is then sign extended to int for comparison. GCC cannot generate an int
> scev because it's not simple: (int) (short) {1, +, 1}_1.
> 
> This can validly loop forever if loopCount > SHORT_MAX.
> For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and
> is incremented by one the addition is fine because is done in (unsigned short)
> and then truncated using modulo 2 (implementation defined behaviour) to short,
> therefore never reaching loopCount and looping forever.
> 
> In RTL we get the following sequence:
> r4:SI <- [loopCount]
> r0:HI <- 0
> 
> code label...
> 
> ...
> 
> r2:HI <- r1:HI + 1
> r3:SI <- sign_extend r2:HI
> 
> p0:BI <- r3:SI < r4:SI
> loop to code label if p0:BI
> 
> I was tempted to simplify this to:
> r4:SI <- [loopCount]
> r0:SI <- 0
> 
> code label...
> 
> ...
> 
> r2:SI <- r1:SI + 1
> 
> p0:BI <- r2:SI < r4:SI
> loop to code label if p0:BI
> 
> However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX,
> therefore I think that at least in this case this cannot be optimized.
> 
> I am tempted to close the bug report. Richard?

Yes.  That sounds correct.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
  2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
                   ` (22 preceding siblings ...)
  2014-02-12 13:30 ` rguenther at suse dot de
@ 2014-02-12 13:39 ` pmatos at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: pmatos at gcc dot gnu.org @ 2014-02-12 13:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

pmatos at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #24 from pmatos at gcc dot gnu.org ---
Closing as invalid. Thanks Richard.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-02-12 13:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30 21:41 [Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop paulo@matos-sorge.com
2014-01-31 10:08 ` [Bug tree-optimization/59999] [4.9 Regression] " rguenth at gcc dot gnu.org
2014-01-31 10:14 ` paulo@matos-sorge.com
2014-01-31 11:41 ` [Bug rtl-optimization/59999] " rguenth at gcc dot gnu.org
2014-01-31 12:04 ` paulo@matos-sorge.com
2014-01-31 14:52 ` rguenth at gcc dot gnu.org
2014-01-31 15:09 ` paulo@matos-sorge.com
2014-02-05 11:03 ` paulo@matos-sorge.com
2014-02-05 12:14 ` paulo@matos-sorge.com
2014-02-05 12:15 ` paulo@matos-sorge.com
2014-02-05 15:37 ` paulo@matos-sorge.com
2014-02-05 17:09 ` paulo@matos-sorge.com
2014-02-06 10:28 ` rguenth at gcc dot gnu.org
2014-02-06 11:16 ` paulo@matos-sorge.com
2014-02-06 11:25 ` paulo@matos-sorge.com
2014-02-06 12:05 ` rguenther at suse dot de
2014-02-06 12:40 ` paulo@matos-sorge.com
2014-02-06 13:04 ` paulo@matos-sorge.com
2014-02-06 13:17 ` rguenther at suse dot de
2014-02-06 13:20 ` rguenther at suse dot de
2014-02-07 10:08 ` paulo@matos-sorge.com
2014-02-07 11:01 ` rguenther at suse dot de
2014-02-12 13:03 ` paulo@matos-sorge.com
2014-02-12 13:30 ` rguenther at suse dot de
2014-02-12 13:39 ` pmatos at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).