public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
@ 2014-01-30 14:05 Paulo Matos
  2014-01-30 14:37 ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Paulo Matos @ 2014-01-30 14:05 UTC (permalink / raw)
  To: gcc

Hello,

I am tracking a performance and size regression from 4.5.4 present in trunk.
Consider the following function:
==
extern short delayLength;
typedef int Sample;
extern Sample *temp_ptr;
extern Sample x;

void
foo (short blockSize)
{
  short i;
  unsigned short loopCount;

  loopCount = (unsigned short) (blockSize + delayLength) % 8;
  for (i = 0; i < loopCount; i++)
      *temp_ptr++ = x ^ *temp_ptr++;
}
==

For v850, before the commit
commit e0ae2fe2a0bebe9de31e3d8eb4feace4909ef009
Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri May 20 19:32:30 2011 +0000

    2011-05-20  Tom de Vries  <tom@codesourcery.com>
    
        PR target/45098
        * tree-ssa-loop-ivopts.c: Include expmed.h.
        (get_shiftadd_cost): New function.
        (force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173976 138bc75d-0d04-0410-961f-82ee72b054a4

gcc generated for -O2:
_foo:
        movhi hi(_delayLength),r0,r10
        ld.h lo(_delayLength)[r10],r10
        add r10,r6
        andi 7,r6,r10
        be .L1
        movhi hi(_temp_ptr),r0,r16
        ld.w lo(_temp_ptr)[r16],r15
        mov r10,r17
        shl 2,r17
        mov r15,r14
        movhi hi(_x),r0,r13
        mov r15,r10
        add r17,r14
        movea lo(_x),r13,r13
.L3:
        ld.w 0[r10],r11
        ld.w 0[r13],r12
        xor r12,r11
        st.w r11,0[r10]
        add 4,r10
        cmp r14,r10
        bne .L3
        mov r15,r10
        add r17,r10
        st.w r10,lo(_temp_ptr)[r16]
.L1:
        jmp [r31]

After the commit it generates:
_foo:
        movhi hi(_delayLength),r0,r10
        ld.h lo(_delayLength)[r10],r16
        add r16,r6
        andi 7,r6,r16
        be .L1
        movhi hi(_temp_ptr),r0,r17
        ld.w lo(_temp_ptr)[r17],r18
        movhi hi(_x),r0,r14
        mov r18,r11
        mov r16,r15
        mov 0,r10
        movea lo(_x),r14,r14
.L3:
        ld.w 0[r11],r12
        ld.w 0[r14],r13
        add 1,r10
        xor r13,r12
        shl 16,r10
        st.w r12,0[r11]
        sar 16,r10
        add 4,r11
        cmp r15,r10
        bne .L3
        shl 2,r16
        add r18,r16
        st.w r16,lo(_temp_ptr)[r17]
.L1:
        jmp [r31]

The problem is inside the loop: 
        shl 16,r10
        st.w r12,0[r11]
        sar 16,r10
        add 4,r11
        cmp r15,r10


shl followed by sar is used to sign extend r10 which was in previous gcc versions not being done and it is unnecessary.
At the point of commit v850 didn't have e3v5 support or zero overhead loops but now it does and this blocks generation of zero overhead loops. (with trunk and -mv850e3v5, gcc generates a sxh instruction instead of the shift pattern but the point is the same).

For mep the situation repeats. mep generates:
foo:
        # frame: 8   8 regs
        lh      $10, %sdaoff(delayLength)($gp)
        add     $sp, -8
        add3    $1, $1, $10
        and3    $10, $1, 0x7
        beqz    $10, .L1
        lw      $2, %sdaoff(temp_ptr)($gp)
        mov     $1, 0
        add3    $11, $gp, %sdaoff(x)
        bra     .L5
.L3:
        mov     $2, $9
.L5:
        lw      $0, ($11)
        lw      $3, 4($2)
        add     $1, 1
        exth    $1
        xor     $3, $0
        slt3    $0, $1, $10
        add3    $9, $2, 8
        sw      $3, ($2)
        bnez    $0, .L3
        sw      $9, %sdaoff(temp_ptr)($gp)
.L1:
        add     $sp, 8
        ret


Again exth signextends $1 and blocks generation of zero overhead loop because suddenly loop is not simple anymore. Unfortunately I cannot test mep before the patch as at the time mep was not in mainline.

Does anyone understand why the mentioned patch is forcing the generation of the sign extend inside the loop? Is this just a problem with cost calculation in the backends or some issue lurking in tree-ssa-loop-ivopts?

Thanks,

Paulo Matos



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 14:05 Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation Paulo Matos
@ 2014-01-30 14:37 ` Andreas Schwab
  2014-01-30 15:00   ` Paulo Matos
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2014-01-30 14:37 UTC (permalink / raw)
  To: Paulo Matos; +Cc: gcc

Paulo Matos <pmatos@broadcom.com> writes:

> void
> foo (short blockSize)
> {
>   short i;
>   unsigned short loopCount;
>
>   loopCount = (unsigned short) (blockSize + delayLength) % 8;
>   for (i = 0; i < loopCount; i++)
>       *temp_ptr++ = x ^ *temp_ptr++;
> }

You know that this is undefined code?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 14:37 ` Andreas Schwab
@ 2014-01-30 15:00   ` Paulo Matos
  2014-01-30 15:19     ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Paulo Matos @ 2014-01-30 15:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc

> -----Original Message-----
> From: Andreas Schwab [mailto:schwab@linux-m68k.org]
> Sent: 30 January 2014 14:29
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead
> loop generation
> 
> Paulo Matos <pmatos@broadcom.com> writes:
> 
> > void
> > foo (short blockSize)
> > {
> >   short i;
> >   unsigned short loopCount;
> >
> >   loopCount = (unsigned short) (blockSize + delayLength) % 8;
> >   for (i = 0; i < loopCount; i++)
> >       *temp_ptr++ = x ^ *temp_ptr++;
> > }
> 
> You know that this is undefined code?
>

Correct, my apologies. I didn't notice the undefined behaviour that the reducer introduced in the original code.
However, the issue persists.
If instead I write:
void
foo (short blockSize)
{
  short i;
  unsigned short loopCount;
  loopCount = (unsigned short) (blockSize + delayLength) % 8;
  for (i = 0; i < loopCount; i++)
      *temp_ptr++ = x ^ *temp_ptr;
}

the sign extend is still generated from the referenced commit and persists until trunk. This causes the zero overhead loop not to be generated.

Thanks,

Paulo Matos
 
> Andreas.
> 
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 15:00   ` Paulo Matos
@ 2014-01-30 15:19     ` Andreas Schwab
  2014-01-30 17:29       ` Paulo Matos
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2014-01-30 15:19 UTC (permalink / raw)
  To: Paulo Matos; +Cc: gcc

Paulo Matos <pmatos@broadcom.com> writes:

> If instead I write:
> void
> foo (short blockSize)
> {
>   short i;
>   unsigned short loopCount;
>   loopCount = (unsigned short) (blockSize + delayLength) % 8;
>   for (i = 0; i < loopCount; i++)
>       *temp_ptr++ = x ^ *temp_ptr;
> }

This is still undefined.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 15:19     ` Andreas Schwab
@ 2014-01-30 17:29       ` Paulo Matos
  2014-01-30 18:07         ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Paulo Matos @ 2014-01-30 17:29 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc

> -----Original Message-----
> From: Andreas Schwab [mailto:schwab@linux-m68k.org]
> Sent: 30 January 2014 15:15
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead
> loop generation
> 
> Paulo Matos <pmatos@broadcom.com> writes:
> 
> > If instead I write:
> > void
> > foo (short blockSize)
> > {
> >   short i;
> >   unsigned short loopCount;
> >   loopCount = (unsigned short) (blockSize + delayLength) % 8;
> >   for (i = 0; i < loopCount; i++)
> >       *temp_ptr++ = x ^ *temp_ptr;
> > }
> 
> This is still undefined.
> 

OK, of course. Don't know what I am doing today.
It's undefined because 'i' might overflow... I will get back to this. Thanks for pointing this out.

> Andreas.
> 
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 17:29       ` Paulo Matos
@ 2014-01-30 18:07         ` Jeff Law
  2014-01-31 10:40           ` Paulo Matos
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2014-01-30 18:07 UTC (permalink / raw)
  To: Paulo Matos, Andreas Schwab; +Cc: gcc

On 01/30/14 08:19, Paulo Matos wrote:
>> -----Original Message-----
>> From: Andreas Schwab [mailto:schwab@linux-m68k.org]
>> Sent: 30 January 2014 15:15
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead
>> loop generation
>>
>> Paulo Matos <pmatos@broadcom.com> writes:
>>
>>> If instead I write:
>>> void
>>> foo (short blockSize)
>>> {
>>>    short i;
>>>    unsigned short loopCount;
>>>    loopCount = (unsigned short) (blockSize + delayLength) % 8;
>>>    for (i = 0; i < loopCount; i++)
>>>        *temp_ptr++ = x ^ *temp_ptr;
>>> }
>>
>> This is still undefined.
>>
>
> OK, of course. Don't know what I am doing today.
> It's undefined because 'i' might overflow... I will get back to this. Thanks for pointing this out.
When you've got it sorted out, go ahead and file a BZ, include the 
regression markers so that it shows up in the searches most of us are 
paying the most attention to right now.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
  2014-01-30 18:07         ` Jeff Law
@ 2014-01-31 10:40           ` Paulo Matos
  0 siblings, 0 replies; 7+ messages in thread
From: Paulo Matos @ 2014-01-31 10:40 UTC (permalink / raw)
  To: Jeff Law, Andreas Schwab; +Cc: gcc


> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: 30 January 2014 15:50
> To: Paulo Matos; Andreas Schwab
> Cc: gcc@gcc.gnu.org
> Subject: Re: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead
> loop generation
> 
> >
> > OK, of course. Don't know what I am doing today.
> > It's undefined because 'i' might overflow... I will get back to this. Thanks
> for pointing this out.
> When you've got it sorted out, go ahead and file a BZ, include the
> regression markers so that it shows up in the searches most of us are
> paying the most attention to right now.
> 
> jeff

Sure. I opened it as 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999

Paulo Matos

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-31  9:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30 14:05 Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation Paulo Matos
2014-01-30 14:37 ` Andreas Schwab
2014-01-30 15:00   ` Paulo Matos
2014-01-30 15:19     ` Andreas Schwab
2014-01-30 17:29       ` Paulo Matos
2014-01-30 18:07         ` Jeff Law
2014-01-31 10:40           ` Paulo Matos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).