public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch, RTL] Eliminate redundant vec_select moves.
@ 2013-11-06 13:37 Tejas Belagod
  2013-11-06 14:07 ` Richard Biener
  2013-11-06 14:25 ` Richard Sandiford
  0 siblings, 2 replies; 76+ messages in thread
From: Tejas Belagod @ 2013-11-06 13:37 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]


Hi,

The attached patch eliminates moves of the form

	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))

i.e. eliminates lower lane moves between src and dst where src and dst are the 
same register and this causes rtl to instead use the destination register in the 
required mode.

Also, if my understanding of Big-Endian is correct, this should be safe for 
big-endian targets as well.

I've bootstrapped this on x64_64, regressed on aarch64-none-elf, 
aarch64_be-none-elf.

OK for trunk?

Thanks,
Tejas Belagod
ARM.

2013-11-06  Tejas Belagod  <tejas.belagod@arm.com>

gcc/
	* rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select
	for same src and dst.

[-- Attachment #2: red-move-2.txt --]
[-- Type: text/plain, Size: 850 bytes --]

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 9769b69..3e434cd 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set)
       dst = SUBREG_REG (dst);
     }
 
+  /* This is big-endian-safe because the elements are kept in target
+     memory order.  So, for eg. PARALLEL element value of 2 is the same in
+     either endian-ness.  */
+  if (GET_CODE (src) == VEC_SELECT
+      && REG_P (XEXP (src, 0)) && REG_P (dst)
+      && REGNO (XEXP (src, 0)) == REGNO (dst))
+    {
+      rtx par = XEXP (src, 1);
+      int i;
+
+      for (i = 0; i < XVECLEN (par, 0); i++)
+	{
+	  rtx tem = XVECEXP (par, 0, i);
+	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
+	    return 0;
+	}
+      return 1;
+    }
+
   return (REG_P (src) && REG_P (dst)
 	  && REGNO (src) == REGNO (dst));
 }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 13:37 [Patch, RTL] Eliminate redundant vec_select moves Tejas Belagod
@ 2013-11-06 14:07 ` Richard Biener
  2013-11-06 16:45   ` Bill Schmidt
  2013-11-06 14:25 ` Richard Sandiford
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Biener @ 2013-11-06 14:07 UTC (permalink / raw)
  To: Tejas Belagod, William J. Schmidt; +Cc: gcc-patches

On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod <tbelagod@arm.com> wrote:
>
> Hi,
>
> The attached patch eliminates moves of the form
>
>         set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> i.e. eliminates lower lane moves between src and dst where src and dst are
> the same register and this causes rtl to instead use the destination
> register in the required mode.
>
> Also, if my understanding of Big-Endian is correct, this should be safe for
> big-endian targets as well.
>
> I've bootstrapped this on x64_64, regressed on aarch64-none-elf,
> aarch64_be-none-elf.
>
> OK for trunk?

It looks good to me (but I'm also wondering about bigendian).  Bill?

Can you add a testcase where this has an effect on code generation?

Thanks,
Richard.

> Thanks,
> Tejas Belagod
> ARM.
>
> 2013-11-06  Tejas Belagod  <tejas.belagod@arm.com>
>
> gcc/
>         * rtlanal.c (set_noop_p): Return nonzero in case of redundant
> vec_select
>         for same src and dst.
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 9769b69..3e434cd 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set)
>        dst = SUBREG_REG (dst);
>      }
>
> +  /* This is big-endian-safe because the elements are kept in target
> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
> +     either endian-ness.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
> +    {
> +      rtx par = XEXP (src, 1);
> +      int i;
> +
> +      for (i = 0; i < XVECLEN (par, 0); i++)
> +       {
> +         rtx tem = XVECEXP (par, 0, i);
> +         if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> +           return 0;
> +       }
> +      return 1;
> +    }
> +
>    return (REG_P (src) && REG_P (dst)
>           && REGNO (src) == REGNO (dst));
>  }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 13:37 [Patch, RTL] Eliminate redundant vec_select moves Tejas Belagod
  2013-11-06 14:07 ` Richard Biener
@ 2013-11-06 14:25 ` Richard Sandiford
  2013-11-06 15:37   ` Tejas Belagod
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-11-06 14:25 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> +  /* This is big-endian-safe because the elements are kept in target
> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
> +     either endian-ness.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
> +    {
> +      rtx par = XEXP (src, 1);
> +      int i;
> +
> +      for (i = 0; i < XVECLEN (par, 0); i++)
> +	{
> +	  rtx tem = XVECEXP (par, 0, i);
> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> +	    return 0;
> +	}
> +      return 1;
> +    }
> +

I think for big endian it needs to be:

    INTVAL (tem) != i + base

where base is something like:

    int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);

E.g. a big-endian V4HI looks like:

    msb          lsb
    0000111122223333

and shortening it to say V2HI only gives the low 32 bits:

            msb  lsb
            22223333

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 14:25 ` Richard Sandiford
@ 2013-11-06 15:37   ` Tejas Belagod
  2013-11-06 16:14     ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-11-06 15:37 UTC (permalink / raw)
  To: gcc-patches, rdsandiford

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> +  /* This is big-endian-safe because the elements are kept in target
>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>> +     either endian-ness.  */
>> +  if (GET_CODE (src) == VEC_SELECT
>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>> +    {
>> +      rtx par = XEXP (src, 1);
>> +      int i;
>> +
>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>> +	{
>> +	  rtx tem = XVECEXP (par, 0, i);
>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>> +	    return 0;
>> +	}
>> +      return 1;
>> +    }
>> +
> 
> I think for big endian it needs to be:
> 
>     INTVAL (tem) != i + base
> 
> where base is something like:
> 
>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
> 
> E.g. a big-endian V4HI looks like:
> 
>     msb          lsb
>     0000111122223333
> 
> and shortening it to say V2HI only gives the low 32 bits:
> 
>             msb  lsb
>             22223333

But, in this case we want

         msb  lsb
         00001111

I was under the impression that the const vector parallel for vec_select 
represents the element indexes of the array in memory order. Therefore,

in bigendian,

          msb             lsb
          0000 1111 2222 3333
element  a[0] a[1] a[2] a[3]

and in littleendian

          msb             lsb
          3333 2222 1111 0000
element  a[3] a[2] a[1] a[0]


so shouldn't a
   vec_select:V2HI ( (reg:V4HI) (parallel ([const 0] [const 1]))

represent the elements {0000, 1111} in both endiannesses? Is my understanding 
broken?

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 15:37   ` Tejas Belagod
@ 2013-11-06 16:14     ` Richard Sandiford
  2013-11-06 17:11       ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-11-06 16:14 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> +  /* This is big-endian-safe because the elements are kept in target
>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>>> +     either endian-ness.  */
>>> +  if (GET_CODE (src) == VEC_SELECT
>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>> +    {
>>> +      rtx par = XEXP (src, 1);
>>> +      int i;
>>> +
>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>> +	{
>>> +	  rtx tem = XVECEXP (par, 0, i);
>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>> +	    return 0;
>>> +	}
>>> +      return 1;
>>> +    }
>>> +
>> 
>> I think for big endian it needs to be:
>> 
>>     INTVAL (tem) != i + base
>> 
>> where base is something like:
>> 
>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>> 
>> E.g. a big-endian V4HI looks like:
>> 
>>     msb          lsb
>>     0000111122223333
>> 
>> and shortening it to say V2HI only gives the low 32 bits:
>> 
>>             msb  lsb
>>             22223333
>
> But, in this case we want
>
>          msb  lsb
>          00001111

It depends on whether the result occupies a full register or not.
I was thinking of the case where it didn't, but I realise now you were
thinking of the case where it did.  And yeah, my suggestion doesn't
cope with that...

> I was under the impression that the const vector parallel for vec_select 
> represents the element indexes of the array in memory order.
>
> Therefore, in bigendian,
>
>           msb             lsb
>           0000 1111 2222 3333
> element  a[0] a[1] a[2] a[3]
>
> and in littleendian
>
>           msb             lsb
>           3333 2222 1111 0000
> element  a[3] a[2] a[1] a[0]

Right.  But if an N-bit value is stored in a register, it's assumed to
occupy the lsb of the register and the N-1 bits above that.  The other
bits in the register are don't-care.

E.g., leaving vectors to one side, if you have:

   (set (reg:HI N) (truncate:SI (reg:SI N)))

on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:

   msb  lsb
   01234567
       VVVV
   xxxx4567

rather than:

   msb  lsb
   01234567
   VVVV
   0123xxxx

for both endiannesses.  The same principle applies to vectors.
The lsb of the register is always assumed to be significant.

So maybe the original patch was correct for partial-register and
full-register results on little-endian, but only for full-register
results on big-endian.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 14:07 ` Richard Biener
@ 2013-11-06 16:45   ` Bill Schmidt
  2013-11-06 17:07     ` Bill Schmidt
  0 siblings, 1 reply; 76+ messages in thread
From: Bill Schmidt @ 2013-11-06 16:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tejas Belagod, gcc-patches

On Wed, 2013-11-06 at 15:01 +0100, Richard Biener wrote:
> On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod <tbelagod@arm.com> wrote:
> >
> > Hi,
> >
> > The attached patch eliminates moves of the form
> >
> >         set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
> >
> > i.e. eliminates lower lane moves between src and dst where src and dst are
> > the same register and this causes rtl to instead use the destination
> > register in the required mode.
> >
> > Also, if my understanding of Big-Endian is correct, this should be safe for
> > big-endian targets as well.
> >
> > I've bootstrapped this on x64_64, regressed on aarch64-none-elf,
> > aarch64_be-none-elf.
> >
> > OK for trunk?
> 
> It looks good to me (but I'm also wondering about bigendian).  Bill?

Yes, that should be ok.  I can run a regression test to confirm.

Thanks,
Bill

> 
> Can you add a testcase where this has an effect on code generation?
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tejas Belagod
> > ARM.
> >
> > 2013-11-06  Tejas Belagod  <tejas.belagod@arm.com>
> >
> > gcc/
> >         * rtlanal.c (set_noop_p): Return nonzero in case of redundant
> > vec_select
> >         for same src and dst.
> > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> > index 9769b69..3e434cd 100644
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set)
> >        dst = SUBREG_REG (dst);
> >      }
> >
> > +  /* This is big-endian-safe because the elements are kept in target
> > +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
> > +     either endian-ness.  */
> > +  if (GET_CODE (src) == VEC_SELECT
> > +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> > +      && REGNO (XEXP (src, 0)) == REGNO (dst))
> > +    {
> > +      rtx par = XEXP (src, 1);
> > +      int i;
> > +
> > +      for (i = 0; i < XVECLEN (par, 0); i++)
> > +       {
> > +         rtx tem = XVECEXP (par, 0, i);
> > +         if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> > +           return 0;
> > +       }
> > +      return 1;
> > +    }
> > +
> >    return (REG_P (src) && REG_P (dst)
> >           && REGNO (src) == REGNO (dst));
> >  }
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 16:45   ` Bill Schmidt
@ 2013-11-06 17:07     ` Bill Schmidt
  0 siblings, 0 replies; 76+ messages in thread
From: Bill Schmidt @ 2013-11-06 17:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tejas Belagod, gcc-patches



On Wed, 2013-11-06 at 10:42 -0600, Bill Schmidt wrote:
> On Wed, 2013-11-06 at 15:01 +0100, Richard Biener wrote:
> > On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod <tbelagod@arm.com> wrote:
> > >
> > > Hi,
> > >
> > > The attached patch eliminates moves of the form
> > >
> > >         set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
> > >
> > > i.e. eliminates lower lane moves between src and dst where src and dst are
> > > the same register and this causes rtl to instead use the destination
> > > register in the required mode.
> > >
> > > Also, if my understanding of Big-Endian is correct, this should be safe for
> > > big-endian targets as well.
> > >
> > > I've bootstrapped this on x64_64, regressed on aarch64-none-elf,
> > > aarch64_be-none-elf.
> > >
> > > OK for trunk?
> > 
> > It looks good to me (but I'm also wondering about bigendian).  Bill?
> 
> Yes, that should be ok.  I can run a regression test to confirm.

Never mind, I agree with Richard S.  The direction of the ordering is
fine, but an offset should be needed.

Bill

> 
> Thanks,
> Bill
> 
> > 
> > Can you add a testcase where this has an effect on code generation?
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tejas Belagod
> > > ARM.
> > >
> > > 2013-11-06  Tejas Belagod  <tejas.belagod@arm.com>
> > >
> > > gcc/
> > >         * rtlanal.c (set_noop_p): Return nonzero in case of redundant
> > > vec_select
> > >         for same src and dst.
> > > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> > > index 9769b69..3e434cd 100644
> > > --- a/gcc/rtlanal.c
> > > +++ b/gcc/rtlanal.c
> > > @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set)
> > >        dst = SUBREG_REG (dst);
> > >      }
> > >
> > > +  /* This is big-endian-safe because the elements are kept in target
> > > +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
> > > +     either endian-ness.  */
> > > +  if (GET_CODE (src) == VEC_SELECT
> > > +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> > > +      && REGNO (XEXP (src, 0)) == REGNO (dst))
> > > +    {
> > > +      rtx par = XEXP (src, 1);
> > > +      int i;
> > > +
> > > +      for (i = 0; i < XVECLEN (par, 0); i++)
> > > +       {
> > > +         rtx tem = XVECEXP (par, 0, i);
> > > +         if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> > > +           return 0;
> > > +       }
> > > +      return 1;
> > > +    }
> > > +
> > >    return (REG_P (src) && REG_P (dst)
> > >           && REGNO (src) == REGNO (dst));
> > >  }
> > 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 16:14     ` Richard Sandiford
@ 2013-11-06 17:11       ` Tejas Belagod
  2013-11-06 18:34         ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-11-06 17:11 UTC (permalink / raw)
  To: gcc-patches, rdsandiford

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>>>> +     either endian-ness.  */
>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>> +    {
>>>> +      rtx par = XEXP (src, 1);
>>>> +      int i;
>>>> +
>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>> +	{
>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>> +	    return 0;
>>>> +	}
>>>> +      return 1;
>>>> +    }
>>>> +
>>> I think for big endian it needs to be:
>>>
>>>     INTVAL (tem) != i + base
>>>
>>> where base is something like:
>>>
>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>
>>> E.g. a big-endian V4HI looks like:
>>>
>>>     msb          lsb
>>>     0000111122223333
>>>
>>> and shortening it to say V2HI only gives the low 32 bits:
>>>
>>>             msb  lsb
>>>             22223333
>> But, in this case we want
>>
>>          msb  lsb
>>          00001111
> 
> It depends on whether the result occupies a full register or not.
> I was thinking of the case where it didn't, but I realise now you were
> thinking of the case where it did.  And yeah, my suggestion doesn't
> cope with that...
> 
>> I was under the impression that the const vector parallel for vec_select 
>> represents the element indexes of the array in memory order.
>>
>> Therefore, in bigendian,
>>
>>           msb             lsb
>>           0000 1111 2222 3333
>> element  a[0] a[1] a[2] a[3]
>>
>> and in littleendian
>>
>>           msb             lsb
>>           3333 2222 1111 0000
>> element  a[3] a[2] a[1] a[0]
> 
> Right.  But if an N-bit value is stored in a register, it's assumed to
> occupy the lsb of the register and the N-1 bits above that.  The other
> bits in the register are don't-care.
> 
> E.g., leaving vectors to one side, if you have:
> 
>    (set (reg:HI N) (truncate:SI (reg:SI N)))
> 
> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
> 
>    msb  lsb
>    01234567
>        VVVV
>    xxxx4567
> 
> rather than:
> 
>    msb  lsb
>    01234567
>    VVVV
>    0123xxxx
> 
> for both endiannesses.  The same principle applies to vectors.
> The lsb of the register is always assumed to be significant.
> 
> So maybe the original patch was correct for partial-register and
> full-register results on little-endian, but only for full-register
> results on big-endian.

Ah, ok! I think I get it. By eliminating
	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))

using the check INTVAL (tem) != i, I'm essentially making subsequent operations 
use (reg:V2DI n) in DI mode which is a partial register result and this gives me 
the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
register mode, I have to make sure the correct elements coincide with the lsb in 
big-endian...

Thanks for your input, I'll apply the offset correction for big-endian you 
suggested. I'll respin the patch.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 17:11       ` Tejas Belagod
@ 2013-11-06 18:34         ` Richard Sandiford
  2013-11-06 19:42           ` Bill Schmidt
  2013-11-07 14:37           ` Tejas Belagod
  0 siblings, 2 replies; 76+ messages in thread
From: Richard Sandiford @ 2013-11-06 18:34 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>>>>> +     either endian-ness.  */
>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>> +    {
>>>>> +      rtx par = XEXP (src, 1);
>>>>> +      int i;
>>>>> +
>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>> +	{
>>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>> +	    return 0;
>>>>> +	}
>>>>> +      return 1;
>>>>> +    }
>>>>> +
>>>> I think for big endian it needs to be:
>>>>
>>>>     INTVAL (tem) != i + base
>>>>
>>>> where base is something like:
>>>>
>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>
>>>> E.g. a big-endian V4HI looks like:
>>>>
>>>>     msb          lsb
>>>>     0000111122223333
>>>>
>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>
>>>>             msb  lsb
>>>>             22223333
>>> But, in this case we want
>>>
>>>          msb  lsb
>>>          00001111
>> 
>> It depends on whether the result occupies a full register or not.
>> I was thinking of the case where it didn't, but I realise now you were
>> thinking of the case where it did.  And yeah, my suggestion doesn't
>> cope with that...
>> 
>>> I was under the impression that the const vector parallel for vec_select 
>>> represents the element indexes of the array in memory order.
>>>
>>> Therefore, in bigendian,
>>>
>>>           msb             lsb
>>>           0000 1111 2222 3333
>>> element  a[0] a[1] a[2] a[3]
>>>
>>> and in littleendian
>>>
>>>           msb             lsb
>>>           3333 2222 1111 0000
>>> element  a[3] a[2] a[1] a[0]
>> 
>> Right.  But if an N-bit value is stored in a register, it's assumed to
>> occupy the lsb of the register and the N-1 bits above that.  The other
>> bits in the register are don't-care.
>> 
>> E.g., leaving vectors to one side, if you have:
>> 
>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>> 
>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>> 
>>    msb  lsb
>>    01234567
>>        VVVV
>>    xxxx4567
>> 
>> rather than:
>> 
>>    msb  lsb
>>    01234567
>>    VVVV
>>    0123xxxx
>> 
>> for both endiannesses.  The same principle applies to vectors.
>> The lsb of the register is always assumed to be significant.
>> 
>> So maybe the original patch was correct for partial-register and
>> full-register results on little-endian, but only for full-register
>> results on big-endian.
>
> Ah, ok! I think I get it. By eliminating
> 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> using the check INTVAL (tem) != i, I'm essentially making subsequent operations 
> use (reg:V2DI n) in DI mode which is a partial register result and this
> gives me
> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
> register mode, I have to make sure the correct elements coincide with
> the lsb in
> big-endian...
>
> Thanks for your input, I'll apply the offset correction for big-endian you 
> suggested. I'll respin the patch.

Thanks.  Just for avoidance of doubt, the result might be a full or
partial register, depending on the mode and target.  I was trying to
correct myself by agreeing that your original was right and mine was
wrong for big-endian if the result is a full register.

I don't know if there are existing helper functions for this kind of thing.

Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 18:34         ` Richard Sandiford
@ 2013-11-06 19:42           ` Bill Schmidt
  2013-11-07 14:37           ` Tejas Belagod
  1 sibling, 0 replies; 76+ messages in thread
From: Bill Schmidt @ 2013-11-06 19:42 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tejas Belagod, gcc-patches

On Wed, 2013-11-06 at 17:34 +0000, Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
> > Richard Sandiford wrote:
> >> Tejas Belagod <tbelagod@arm.com> writes:
> >>> Richard Sandiford wrote:
> >>>> Tejas Belagod <tbelagod@arm.com> writes:
> >>>>> +  /* This is big-endian-safe because the elements are kept in target
> >>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
> >>>>> +     either endian-ness.  */
> >>>>> +  if (GET_CODE (src) == VEC_SELECT
> >>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> >>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
> >>>>> +    {
> >>>>> +      rtx par = XEXP (src, 1);
> >>>>> +      int i;
> >>>>> +
> >>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
> >>>>> +	{
> >>>>> +	  rtx tem = XVECEXP (par, 0, i);
> >>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> >>>>> +	    return 0;
> >>>>> +	}
> >>>>> +      return 1;
> >>>>> +    }
> >>>>> +
> >>>> I think for big endian it needs to be:
> >>>>
> >>>>     INTVAL (tem) != i + base
> >>>>
> >>>> where base is something like:
> >>>>
> >>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
> >>>>
> >>>> E.g. a big-endian V4HI looks like:
> >>>>
> >>>>     msb          lsb
> >>>>     0000111122223333
> >>>>
> >>>> and shortening it to say V2HI only gives the low 32 bits:
> >>>>
> >>>>             msb  lsb
> >>>>             22223333
> >>> But, in this case we want
> >>>
> >>>          msb  lsb
> >>>          00001111
> >> 
> >> It depends on whether the result occupies a full register or not.
> >> I was thinking of the case where it didn't, but I realise now you were
> >> thinking of the case where it did.  And yeah, my suggestion doesn't
> >> cope with that...
> >> 
> >>> I was under the impression that the const vector parallel for vec_select 
> >>> represents the element indexes of the array in memory order.
> >>>
> >>> Therefore, in bigendian,
> >>>
> >>>           msb             lsb
> >>>           0000 1111 2222 3333
> >>> element  a[0] a[1] a[2] a[3]
> >>>
> >>> and in littleendian
> >>>
> >>>           msb             lsb
> >>>           3333 2222 1111 0000
> >>> element  a[3] a[2] a[1] a[0]
> >> 
> >> Right.  But if an N-bit value is stored in a register, it's assumed to
> >> occupy the lsb of the register and the N-1 bits above that.  The other
> >> bits in the register are don't-care.
> >> 
> >> E.g., leaving vectors to one side, if you have:
> >> 
> >>    (set (reg:HI N) (truncate:SI (reg:SI N)))
> >> 
> >> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
> >> 
> >>    msb  lsb
> >>    01234567
> >>        VVVV
> >>    xxxx4567
> >> 
> >> rather than:
> >> 
> >>    msb  lsb
> >>    01234567
> >>    VVVV
> >>    0123xxxx
> >> 
> >> for both endiannesses.  The same principle applies to vectors.
> >> The lsb of the register is always assumed to be significant.
> >> 
> >> So maybe the original patch was correct for partial-register and
> >> full-register results on little-endian, but only for full-register
> >> results on big-endian.
> >
> > Ah, ok! I think I get it. By eliminating
> > 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
> >
> > using the check INTVAL (tem) != i, I'm essentially making subsequent operations 
> > use (reg:V2DI n) in DI mode which is a partial register result and this
> > gives me
> > the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
> > register mode, I have to make sure the correct elements coincide with
> > the lsb in
> > big-endian...
> >
> > Thanks for your input, I'll apply the offset correction for big-endian you 
> > suggested. I'll respin the patch.
> 
> Thanks.  Just for avoidance of doubt, the result might be a full or
> partial register, depending on the mode and target.  I was trying to
> correct myself by agreeing that your original was right and mine was
> wrong for big-endian if the result is a full register.
> 
> I don't know if there are existing helper functions for this kind of thing.

Tejas, can you please include test cases for both possibilities?  The
existing test suite is not sufficient (your original patch does not
demonstrate regressions on powerpc64 big endian, even though we know
it's not correct).  Thanks!

Bill

> 
> Richard
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-06 18:34         ` Richard Sandiford
  2013-11-06 19:42           ` Bill Schmidt
@ 2013-11-07 14:37           ` Tejas Belagod
  2013-11-07 15:15             ` Richard Sandiford
  1 sibling, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-11-07 14:37 UTC (permalink / raw)
  To: gcc-patches, rdsandiford

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> Richard Sandiford wrote:
>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>>>>>> +     either endian-ness.  */
>>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>> +    {
>>>>>> +      rtx par = XEXP (src, 1);
>>>>>> +      int i;
>>>>>> +
>>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>> +	{
>>>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>> +	    return 0;
>>>>>> +	}
>>>>>> +      return 1;
>>>>>> +    }
>>>>>> +
>>>>> I think for big endian it needs to be:
>>>>>
>>>>>     INTVAL (tem) != i + base
>>>>>
>>>>> where base is something like:
>>>>>
>>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>
>>>>> E.g. a big-endian V4HI looks like:
>>>>>
>>>>>     msb          lsb
>>>>>     0000111122223333
>>>>>
>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>
>>>>>             msb  lsb
>>>>>             22223333
>>>> But, in this case we want
>>>>
>>>>          msb  lsb
>>>>          00001111
>>> It depends on whether the result occupies a full register or not.
>>> I was thinking of the case where it didn't, but I realise now you were
>>> thinking of the case where it did.  And yeah, my suggestion doesn't
>>> cope with that...
>>>
>>>> I was under the impression that the const vector parallel for vec_select 
>>>> represents the element indexes of the array in memory order.
>>>>
>>>> Therefore, in bigendian,
>>>>
>>>>           msb             lsb
>>>>           0000 1111 2222 3333
>>>> element  a[0] a[1] a[2] a[3]
>>>>
>>>> and in littleendian
>>>>
>>>>           msb             lsb
>>>>           3333 2222 1111 0000
>>>> element  a[3] a[2] a[1] a[0]
>>> Right.  But if an N-bit value is stored in a register, it's assumed to
>>> occupy the lsb of the register and the N-1 bits above that.  The other
>>> bits in the register are don't-care.
>>>
>>> E.g., leaving vectors to one side, if you have:
>>>
>>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>
>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>
>>>    msb  lsb
>>>    01234567
>>>        VVVV
>>>    xxxx4567
>>>
>>> rather than:
>>>
>>>    msb  lsb
>>>    01234567
>>>    VVVV
>>>    0123xxxx
>>>
>>> for both endiannesses.  The same principle applies to vectors.
>>> The lsb of the register is always assumed to be significant.
>>>
>>> So maybe the original patch was correct for partial-register and
>>> full-register results on little-endian, but only for full-register
>>> results on big-endian.
>> Ah, ok! I think I get it. By eliminating
>> 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>
>> using the check INTVAL (tem) != i, I'm essentially making subsequent operations 
>> use (reg:V2DI n) in DI mode which is a partial register result and this
>> gives me
>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
>> register mode, I have to make sure the correct elements coincide with
>> the lsb in
>> big-endian...
>>
>> Thanks for your input, I'll apply the offset correction for big-endian you 
>> suggested. I'll respin the patch.
> 
> Thanks.  Just for avoidance of doubt, the result might be a full or
> partial register, depending on the mode and target.  I was trying to
> correct myself by agreeing that your original was right and mine was
> wrong for big-endian if the result is a full register.
> 

What I had in mind when I implemented this was a partial-reg result, but 
obviously it was wrong.

Sorry, I'm going to take a step back - I'm trying to figure out what a full 
register result would look like. Looking at the pattern,

   set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))

the result is always a mode smaller than the src if the vec_select selects a 
subset of lanes. Plus if the src and dst are the same reg, because we're 
re-writing the src reg, wouldn't it always end up being a partial-reg?
In this case, wouldn't (reg:DI n) always represent

      msb  lsb
      22223333

Thanks,
Tejas.


> I don't know if there are existing helper functions for this kind of thing.
> 
> Richard
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-07 14:37           ` Tejas Belagod
@ 2013-11-07 15:15             ` Richard Sandiford
  2013-11-07 18:05               ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-11-07 15:15 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>> Richard Sandiford wrote:
>>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>>>> + memory order.  So, for eg. PARALLEL element value of 2 is the
>>>>>>> same in
>>>>>>> +     either endian-ness.  */
>>>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>>> +    {
>>>>>>> +      rtx par = XEXP (src, 1);
>>>>>>> +      int i;
>>>>>>> +
>>>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>>> +	{
>>>>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>>> +	    return 0;
>>>>>>> +	}
>>>>>>> +      return 1;
>>>>>>> +    }
>>>>>>> +
>>>>>> I think for big endian it needs to be:
>>>>>>
>>>>>>     INTVAL (tem) != i + base
>>>>>>
>>>>>> where base is something like:
>>>>>>
>>>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>>
>>>>>> E.g. a big-endian V4HI looks like:
>>>>>>
>>>>>>     msb          lsb
>>>>>>     0000111122223333
>>>>>>
>>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>>
>>>>>>             msb  lsb
>>>>>>             22223333
>>>>> But, in this case we want
>>>>>
>>>>>          msb  lsb
>>>>>          00001111
>>>> It depends on whether the result occupies a full register or not.
>>>> I was thinking of the case where it didn't, but I realise now you were
>>>> thinking of the case where it did.  And yeah, my suggestion doesn't
>>>> cope with that...
>>>>
>>>>> I was under the impression that the const vector parallel for vec_select 
>>>>> represents the element indexes of the array in memory order.
>>>>>
>>>>> Therefore, in bigendian,
>>>>>
>>>>>           msb             lsb
>>>>>           0000 1111 2222 3333
>>>>> element  a[0] a[1] a[2] a[3]
>>>>>
>>>>> and in littleendian
>>>>>
>>>>>           msb             lsb
>>>>>           3333 2222 1111 0000
>>>>> element  a[3] a[2] a[1] a[0]
>>>> Right.  But if an N-bit value is stored in a register, it's assumed to
>>>> occupy the lsb of the register and the N-1 bits above that.  The other
>>>> bits in the register are don't-care.
>>>>
>>>> E.g., leaving vectors to one side, if you have:
>>>>
>>>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>>
>>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>>
>>>>    msb  lsb
>>>>    01234567
>>>>        VVVV
>>>>    xxxx4567
>>>>
>>>> rather than:
>>>>
>>>>    msb  lsb
>>>>    01234567
>>>>    VVVV
>>>>    0123xxxx
>>>>
>>>> for both endiannesses.  The same principle applies to vectors.
>>>> The lsb of the register is always assumed to be significant.
>>>>
>>>> So maybe the original patch was correct for partial-register and
>>>> full-register results on little-endian, but only for full-register
>>>> results on big-endian.
>>> Ah, ok! I think I get it. By eliminating
>>> 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>>
>>> using the check INTVAL (tem) != i, I'm essentially making subsequent
>>> operations
>>> use (reg:V2DI n) in DI mode which is a partial register result and this
>>> gives me
>>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
>>> register mode, I have to make sure the correct elements coincide with
>>> the lsb in
>>> big-endian...
>>>
>>> Thanks for your input, I'll apply the offset correction for big-endian you 
>>> suggested. I'll respin the patch.
>> 
>> Thanks.  Just for avoidance of doubt, the result might be a full or
>> partial register, depending on the mode and target.  I was trying to
>> correct myself by agreeing that your original was right and mine was
>> wrong for big-endian if the result is a full register.
>> 
>
> What I had in mind when I implemented this was a partial-reg result, but 
> obviously it was wrong.
>
> Sorry, I'm going to take a step back - I'm trying to figure out what a full 
> register result would look like. Looking at the pattern,
>
>    set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> the result is always a mode smaller than the src if the vec_select selects a 
> subset of lanes. Plus if the src and dst are the same reg, because we're 
> re-writing the src reg, wouldn't it always end up being a partial-reg?
> In this case, wouldn't (reg:DI n) always represent
>
>       msb  lsb
>       22223333

The problem is that one reg rtx can span several hard registers.
E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
but it might instead represent two 32-bit registers (nos. 32 and 33).
Obviously the latter's not very likely for vectors this small,
but more likely for larger ones (including on NEON IIRC).

So if we had 2 32-bit registers being treated as a V4HI, it would be:

   <--32--><--33-->
   msb          lsb
   0000111122223333
   VVVVVVVV
   00001111
   msb  lsb
   <--32-->

for big endian and:

   <--33--><--32-->
   msb          lsb
   3333222211110000
           VVVVVVVV
           11110000
           msb  lsb
           <--32-->

for little endian.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-07 15:15             ` Richard Sandiford
@ 2013-11-07 18:05               ` Tejas Belagod
  2013-11-10 20:59                 ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-11-07 18:05 UTC (permalink / raw)
  To: gcc-patches, rdsandiford

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> Richard Sandiford wrote:
>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> Richard Sandiford wrote:
>>>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>>>>> + memory order.  So, for eg. PARALLEL element value of 2 is the
>>>>>>>> same in
>>>>>>>> +     either endian-ness.  */
>>>>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>>>> +    {
>>>>>>>> +      rtx par = XEXP (src, 1);
>>>>>>>> +      int i;
>>>>>>>> +
>>>>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>>>> +	{
>>>>>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>>>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>>>> +	    return 0;
>>>>>>>> +	}
>>>>>>>> +      return 1;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>> I think for big endian it needs to be:
>>>>>>>
>>>>>>>     INTVAL (tem) != i + base
>>>>>>>
>>>>>>> where base is something like:
>>>>>>>
>>>>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>>>
>>>>>>> E.g. a big-endian V4HI looks like:
>>>>>>>
>>>>>>>     msb          lsb
>>>>>>>     0000111122223333
>>>>>>>
>>>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>>>
>>>>>>>             msb  lsb
>>>>>>>             22223333
>>>>>> But, in this case we want
>>>>>>
>>>>>>          msb  lsb
>>>>>>          00001111
>>>>> It depends on whether the result occupies a full register or not.
>>>>> I was thinking of the case where it didn't, but I realise now you were
>>>>> thinking of the case where it did.  And yeah, my suggestion doesn't
>>>>> cope with that...
>>>>>
>>>>>> I was under the impression that the const vector parallel for vec_select 
>>>>>> represents the element indexes of the array in memory order.
>>>>>>
>>>>>> Therefore, in bigendian,
>>>>>>
>>>>>>           msb             lsb
>>>>>>           0000 1111 2222 3333
>>>>>> element  a[0] a[1] a[2] a[3]
>>>>>>
>>>>>> and in littleendian
>>>>>>
>>>>>>           msb             lsb
>>>>>>           3333 2222 1111 0000
>>>>>> element  a[3] a[2] a[1] a[0]
>>>>> Right.  But if an N-bit value is stored in a register, it's assumed to
>>>>> occupy the lsb of the register and the N-1 bits above that.  The other
>>>>> bits in the register are don't-care.
>>>>>
>>>>> E.g., leaving vectors to one side, if you have:
>>>>>
>>>>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>>>
>>>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>>>
>>>>>    msb  lsb
>>>>>    01234567
>>>>>        VVVV
>>>>>    xxxx4567
>>>>>
>>>>> rather than:
>>>>>
>>>>>    msb  lsb
>>>>>    01234567
>>>>>    VVVV
>>>>>    0123xxxx
>>>>>
>>>>> for both endiannesses.  The same principle applies to vectors.
>>>>> The lsb of the register is always assumed to be significant.
>>>>>
>>>>> So maybe the original patch was correct for partial-register and
>>>>> full-register results on little-endian, but only for full-register
>>>>> results on big-endian.
>>>> Ah, ok! I think I get it. By eliminating
>>>> 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>>>
>>>> using the check INTVAL (tem) != i, I'm essentially making subsequent
>>>> operations
>>>> use (reg:V2DI n) in DI mode which is a partial register result and this
>>>> gives me
>>>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
>>>> register mode, I have to make sure the correct elements coincide with
>>>> the lsb in
>>>> big-endian...
>>>>
>>>> Thanks for your input, I'll apply the offset correction for big-endian you 
>>>> suggested. I'll respin the patch.
>>> Thanks.  Just for avoidance of doubt, the result might be a full or
>>> partial register, depending on the mode and target.  I was trying to
>>> correct myself by agreeing that your original was right and mine was
>>> wrong for big-endian if the result is a full register.
>>>
>> What I had in mind when I implemented this was a partial-reg result, but 
>> obviously it was wrong.
>>
>> Sorry, I'm going to take a step back - I'm trying to figure out what a full 
>> register result would look like. Looking at the pattern,
>>
>>    set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>
>> the result is always a mode smaller than the src if the vec_select selects a 
>> subset of lanes. Plus if the src and dst are the same reg, because we're 
>> re-writing the src reg, wouldn't it always end up being a partial-reg?
>> In this case, wouldn't (reg:DI n) always represent
>>
>>       msb  lsb
>>       22223333
> 
> The problem is that one reg rtx can span several hard registers.
> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
> but it might instead represent two 32-bit registers (nos. 32 and 33).
> Obviously the latter's not very likely for vectors this small,
> but more likely for larger ones (including on NEON IIRC).
> 
> So if we had 2 32-bit registers being treated as a V4HI, it would be:
> 
>    <--32--><--33-->
>    msb          lsb
>    0000111122223333
>    VVVVVVVV
>    00001111
>    msb  lsb
>    <--32-->
> 
> for big endian and:
> 
>    <--33--><--32-->
>    msb          lsb
>    3333222211110000
>            VVVVVVVV
>            11110000
>            msb  lsb
>            <--32-->
> 
> for little endian.

Ah, ok, that makes things clearer. Thanks for that.

I can't find any helper function that figures out if we're writing partial or 
full result regs. Would something like

     REGNO (src) == REGNO (dst) &&
     HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1

be a sane check for partial result regs?

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-07 18:05               ` Tejas Belagod
@ 2013-11-10 20:59                 ` Richard Sandiford
  2013-11-27 17:59                   ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-11-10 20:59 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
>> The problem is that one reg rtx can span several hard registers.
>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>> Obviously the latter's not very likely for vectors this small,
>> but more likely for larger ones (including on NEON IIRC).
>> 
>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>> 
>>    <--32--><--33-->
>>    msb          lsb
>>    0000111122223333
>>    VVVVVVVV
>>    00001111
>>    msb  lsb
>>    <--32-->
>> 
>> for big endian and:
>> 
>>    <--33--><--32-->
>>    msb          lsb
>>    3333222211110000
>>            VVVVVVVV
>>            11110000
>>            msb  lsb
>>            <--32-->
>> 
>> for little endian.
>
> Ah, ok, that makes things clearer. Thanks for that.
>
> I can't find any helper function that figures out if we're writing partial or 
> full result regs. Would something like
>
>      REGNO (src) == REGNO (dst) &&
>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>
> be a sane check for partial result regs?

Yeah, that should work.  I think a more general alternative would be:

  simplify_subreg_regno (REGNO (src), GET_MODE (src),
                         offset, GET_MODE (dst)) == (int) REGNO (dst)

where:

  offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))

That offset is the byte offset of the first selected element from the
start of a vector in memory, which is also the way that SUBREG_BYTEs
are counted.  For little-endian it gives the offset of the lsb of the
slice, while for big-endian it gives the offset of the msb (which is
also how SUBREG_BYTEs work).

The simplify_subreg_regno should cope with both single-register vectors
and multi-register vectors.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-10 20:59                 ` Richard Sandiford
@ 2013-11-27 17:59                   ` Tejas Belagod
  2013-11-28 11:33                     ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-11-27 17:59 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>>> The problem is that one reg rtx can span several hard registers.
>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>> Obviously the latter's not very likely for vectors this small,
>>> but more likely for larger ones (including on NEON IIRC).
>>>
>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>
>>>    <--32--><--33-->
>>>    msb          lsb
>>>    0000111122223333
>>>    VVVVVVVV
>>>    00001111
>>>    msb  lsb
>>>    <--32-->
>>>
>>> for big endian and:
>>>
>>>    <--33--><--32-->
>>>    msb          lsb
>>>    3333222211110000
>>>            VVVVVVVV
>>>            11110000
>>>            msb  lsb
>>>            <--32-->
>>>
>>> for little endian.
>> Ah, ok, that makes things clearer. Thanks for that.
>>
>> I can't find any helper function that figures out if we're writing partial or 
>> full result regs. Would something like
>>
>>      REGNO (src) == REGNO (dst) &&
>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>
>> be a sane check for partial result regs?
> 
> Yeah, that should work.  I think a more general alternative would be:
> 
>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
> 
> where:
> 
>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
> 
> That offset is the byte offset of the first selected element from the
> start of a vector in memory, which is also the way that SUBREG_BYTEs
> are counted.  For little-endian it gives the offset of the lsb of the
> slice, while for big-endian it gives the offset of the msb (which is
> also how SUBREG_BYTEs work).
> 
> The simplify_subreg_regno should cope with both single-register vectors
> and multi-register vectors.

Sorry for the delayed response to this.

Thanks for the tip. Here's an improved patch that implements the 
simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
test case, I failed to get the ppc back-end to generate RTL pattern that this 
patch checks for. I can easily write a test case for aarch64(big and little 
endian) on these lines

typedef float float32x4_t __attribute__ ((__vector_size__ (16)));

float foo_be (float32x4_t x)
{
   return x[3];
}

float foo_le (float32x4_t x)
{
   return x[0];
}

where I know that the vector indexing will generate a vec_select on the same src 
and dst regs that could be optimized away and hence test it. But I'm struggling 
to get a test case  that the ppc altivec back-end will generate such a 
vec_select for. I see that altivec does not define vec_extract, so a simple 
indexing like this seems to happen via memory. Also, I don't know enough about 
the ppc PCS or architecture to write a test that will check for this 
optimization opportunity on same src and dst hard-registers. Any hints?

This patch has been bootstrapped on x64_64 and regressed on aarch64-none-elf and 
aarch64_be-none-elf.

Thanks for your patience,
Tejas.

[-- Attachment #2: rm.txt --]
[-- Type: text/plain, Size: 857 bytes --]

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..ca25ce5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
       dst = SUBREG_REG (dst);
     }
 
+  /* It is a NOOP if destination overlaps with selected src vector
+     elements.  */
+  if (GET_CODE (src) == VEC_SELECT
+      && REG_P (XEXP (src, 0)) && REG_P (dst)
+      && HARD_REGISTER_P (XEXP (src, 0))
+      && HARD_REGISTER_P (dst))
+    {
+      rtx par = XEXP (src, 1);
+      rtx src0 = XEXP (src, 0);
+      HOST_WIDE_INT offset =
+	GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
+
+      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
+				    offset, GET_MODE (dst)) == (int)REGNO (dst);
+    }
+
   return (REG_P (src) && REG_P (dst)
 	  && REGNO (src) == REGNO (dst));
 }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-27 17:59                   ` Tejas Belagod
@ 2013-11-28 11:33                     ` Richard Sandiford
  2013-12-04 16:07                       ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-11-28 11:33 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Bill Schmidt, gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> The problem is that one reg rtx can span several hard registers.
>>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>>> Obviously the latter's not very likely for vectors this small,
>>>> but more likely for larger ones (including on NEON IIRC).
>>>>
>>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>>
>>>>    <--32--><--33-->
>>>>    msb          lsb
>>>>    0000111122223333
>>>>    VVVVVVVV
>>>>    00001111
>>>>    msb  lsb
>>>>    <--32-->
>>>>
>>>> for big endian and:
>>>>
>>>>    <--33--><--32-->
>>>>    msb          lsb
>>>>    3333222211110000
>>>>            VVVVVVVV
>>>>            11110000
>>>>            msb  lsb
>>>>            <--32-->
>>>>
>>>> for little endian.
>>> Ah, ok, that makes things clearer. Thanks for that.
>>>
>>> I can't find any helper function that figures out if we're writing
>>> partial or
>>> full result regs. Would something like
>>>
>>>      REGNO (src) == REGNO (dst) &&
>>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>
>>> be a sane check for partial result regs?
>> 
>> Yeah, that should work.  I think a more general alternative would be:
>> 
>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
>> 
>> where:
>> 
>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
>> 
>> That offset is the byte offset of the first selected element from the
>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>> are counted.  For little-endian it gives the offset of the lsb of the
>> slice, while for big-endian it gives the offset of the msb (which is
>> also how SUBREG_BYTEs work).
>> 
>> The simplify_subreg_regno should cope with both single-register vectors
>> and multi-register vectors.
>
> Sorry for the delayed response to this.
>
> Thanks for the tip. Here's an improved patch that implements the 
> simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
> test case, I failed to get the ppc back-end to generate RTL pattern that this 
> patch checks for. I can easily write a test case for aarch64(big and little 
> endian) on these lines
>
> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>
> float foo_be (float32x4_t x)
> {
>    return x[3];
> }
>
> float foo_le (float32x4_t x)
> {
>    return x[0];
> }
>
> where I know that the vector indexing will generate a vec_select on
> the same src and dst regs that could be optimized away and hence test
> it. But I'm struggling to get a test case that the ppc altivec
> back-end will generate such a vec_select for. I see that altivec does
> not define vec_extract, so a simple indexing like this seems to happen
> via memory. Also, I don't know enough about the ppc PCS or
> architecture to write a test that will check for this optimization
> opportunity on same src and dst hard-registers. Any hints?

Me neither, sorry.

FWIW, the MIPS tests:

  typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
  void bar (float);
  void foo_be (float32x2_t x) { bar (x[1]); }
  void foo_le (float32x2_t x) { bar (x[0]); }

also exercise it, but I don't think they add anything over the aarch64
versions.  I can add them to the testsuite anyway if it helps though.

> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 0cd0c7e..ca25ce5 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>        dst = SUBREG_REG (dst);
>      }
>  
> +  /* It is a NOOP if destination overlaps with selected src vector
> +     elements.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> +      && HARD_REGISTER_P (XEXP (src, 0))
> +      && HARD_REGISTER_P (dst))
> +    {
> +      rtx par = XEXP (src, 1);
> +      rtx src0 = XEXP (src, 0);
> +      HOST_WIDE_INT offset =
> +	GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
> +
> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> +				    offset, GET_MODE (dst)) == (int)REGNO (dst);
> +    }
> +

Since this also (correctly) triggers for vector results, we need to keep
the check for consecutive indices that you had originally.  (It's always
the first index that should be used for the simplify_subreg_regno though.)

Looks good to me otherwise, thanks.

Richard.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-11-28 11:33                     ` Richard Sandiford
@ 2013-12-04 16:07                       ` Tejas Belagod
  2013-12-04 16:14                         ` H.J. Lu
                                           ` (3 more replies)
  0 siblings, 4 replies; 76+ messages in thread
From: Tejas Belagod @ 2013-12-04 16:07 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 5488 bytes --]

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>> The problem is that one reg rtx can span several hard registers.
>>>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>>>> Obviously the latter's not very likely for vectors this small,
>>>>> but more likely for larger ones (including on NEON IIRC).
>>>>>
>>>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>>>
>>>>>    <--32--><--33-->
>>>>>    msb          lsb
>>>>>    0000111122223333
>>>>>    VVVVVVVV
>>>>>    00001111
>>>>>    msb  lsb
>>>>>    <--32-->
>>>>>
>>>>> for big endian and:
>>>>>
>>>>>    <--33--><--32-->
>>>>>    msb          lsb
>>>>>    3333222211110000
>>>>>            VVVVVVVV
>>>>>            11110000
>>>>>            msb  lsb
>>>>>            <--32-->
>>>>>
>>>>> for little endian.
>>>> Ah, ok, that makes things clearer. Thanks for that.
>>>>
>>>> I can't find any helper function that figures out if we're writing
>>>> partial or
>>>> full result regs. Would something like
>>>>
>>>>      REGNO (src) == REGNO (dst) &&
>>>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>>
>>>> be a sane check for partial result regs?
>>> Yeah, that should work.  I think a more general alternative would be:
>>>
>>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
>>>
>>> where:
>>>
>>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
>>>
>>> That offset is the byte offset of the first selected element from the
>>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>>> are counted.  For little-endian it gives the offset of the lsb of the
>>> slice, while for big-endian it gives the offset of the msb (which is
>>> also how SUBREG_BYTEs work).
>>>
>>> The simplify_subreg_regno should cope with both single-register vectors
>>> and multi-register vectors.
>> Sorry for the delayed response to this.
>>
>> Thanks for the tip. Here's an improved patch that implements the 
>> simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
>> test case, I failed to get the ppc back-end to generate RTL pattern that this 
>> patch checks for. I can easily write a test case for aarch64(big and little 
>> endian) on these lines
>>
>> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>>
>> float foo_be (float32x4_t x)
>> {
>>    return x[3];
>> }
>>
>> float foo_le (float32x4_t x)
>> {
>>    return x[0];
>> }
>>
>> where I know that the vector indexing will generate a vec_select on
>> the same src and dst regs that could be optimized away and hence test
>> it. But I'm struggling to get a test case that the ppc altivec
>> back-end will generate such a vec_select for. I see that altivec does
>> not define vec_extract, so a simple indexing like this seems to happen
>> via memory. Also, I don't know enough about the ppc PCS or
>> architecture to write a test that will check for this optimization
>> opportunity on same src and dst hard-registers. Any hints?
> 
> Me neither, sorry.
> 
> FWIW, the MIPS tests:
> 
>   typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
>   void bar (float);
>   void foo_be (float32x2_t x) { bar (x[1]); }
>   void foo_le (float32x2_t x) { bar (x[0]); }
> 
> also exercise it, but I don't think they add anything over the aarch64
> versions.  I can add them to the testsuite anyway if it helps though.
> 
>> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
>> index 0cd0c7e..ca25ce5 100644
>> --- a/gcc/rtlanal.c
>> +++ b/gcc/rtlanal.c
>> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>>        dst = SUBREG_REG (dst);
>>      }
>>  
>> +  /* It is a NOOP if destination overlaps with selected src vector
>> +     elements.  */
>> +  if (GET_CODE (src) == VEC_SELECT
>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>> +      && HARD_REGISTER_P (XEXP (src, 0))
>> +      && HARD_REGISTER_P (dst))
>> +    {
>> +      rtx par = XEXP (src, 1);
>> +      rtx src0 = XEXP (src, 0);
>> +      HOST_WIDE_INT offset =
>> +	GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
>> +
>> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
>> +				    offset, GET_MODE (dst)) == (int)REGNO (dst);
>> +    }
>> +
> 
> Since this also (correctly) triggers for vector results, we need to keep
> the check for consecutive indices that you had originally.  (It's always
> the first index that should be used for the simplify_subreg_regno though.)
> 
> Looks good to me otherwise, thanks.

Thanks Richard. Here is a revised patch. Sorry about the delay - I was 
investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to 
this patch. I've added a test case that I expect to pass for aarch64. I've also 
added the tests that you suggested for MIPS, but haven't checked for the target 
because I'm not sure what optimizations happen on MIPS.

OK for trunk?

Thanks,
Tejas.

2013-12-04  Tejas Belagod  <tejas.belagod@arm.com>

gcc/
	* rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select
	for overlapping register lanes.

testsuite/
	* config/gcc.dg/vect/vect-nop-move.c: New.


[-- Attachment #2: rm-1.txt --]
[-- Type: text/plain, Size: 2420 bytes --]

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..e1388c8 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,26 @@ set_noop_p (const_rtx set)
       dst = SUBREG_REG (dst);
     }
 
+  /* It is a NOOP if destination overlaps with selected src vector
+     elements.  */
+  if (GET_CODE (src) == VEC_SELECT
+      && REG_P (XEXP (src, 0)) && REG_P (dst)
+      && HARD_REGISTER_P (XEXP (src, 0))
+      && HARD_REGISTER_P (dst))
+    {
+      int i;
+      rtx par = XEXP (src, 1);
+      rtx src0 = XEXP (src, 0);
+      int c0 = INTVAL (XVECEXP (par, 0, 0));
+      HOST_WIDE_INT offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
+
+      for (i = 1; i < XVECLEN (par, 0); i++)
+	if (INTVAL (XVECEXP (par, 0, i)) != c0 + i)
+	  return 0;
+      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
+				    offset, GET_MODE (dst)) == (int)REGNO (dst);
+    }
+
   return (REG_P (src) && REG_P (dst)
 	  && REGNO (src) == REGNO (dst));
 }
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
new file mode 100644
index 0000000..1941933
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */ 
+/* { dg-require-effective-target vect_float } */
+/* { dg-options "-O3 -fdump-rtl-combine-details" } */
+
+extern void abort (void);
+
+#define NOINLINE __attribute__((noinline))
+
+typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
+typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
+
+NOINLINE float
+foo32x4_be (float32x4_t x)
+{
+  return x[3];
+}
+
+NOINLINE float
+foo32x4_le (float32x4_t x)
+{
+  return x[0];
+}
+
+NOINLINE float
+bar (float a)
+{
+  return a;
+}
+
+NOINLINE float
+foo32x2_be (float32x2_t x)
+{
+  return bar (x[1]);
+}
+
+NOINLINE float
+foo32x2_le (float32x2_t x)
+{
+  return bar (x[0]);
+}
+
+int
+main()
+{
+  float32x4_t a = { 0.0f, 1.0f, 2.0f, 3.0f };
+  float32x2_t b = { 0.0f, 1.0f };
+
+  if (foo32x4_be (a) != 3.0f)
+    abort ();
+
+  if (foo32x4_le (a) != 0.0f)
+    abort ();
+
+  if (foo32x2_be (b) != 1.0f)
+    abort ();
+
+  if (foo32x2_le (b) != 0.0f)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 16:07                       ` Tejas Belagod
@ 2013-12-04 16:14                         ` H.J. Lu
  2013-12-04 17:29                           ` Jeff Law
  2013-12-05 22:38                           ` Jakub Jelinek
  2013-12-04 17:36                         ` Richard Sandiford
                                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-04 16:14 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Bill Schmidt, gcc-patches, Richard Sandiford

On Wed, Dec 4, 2013 at 8:06 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> Thanks Richard. Here is a revised patch. Sorry about the delay - I was
> investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated
> to this patch. I've added a test case that I expect to pass for aarch64.
> I've also added the tests that you suggested for MIPS, but haven't checked
> for the target because I'm not sure what optimizations happen on MIPS.
>
> OK for trunk?
>
> Thanks,
> Tejas.
>
> 2013-12-04  Tejas Belagod  <tejas.belagod@arm.com>
>
>
> gcc/
>         * rtlanal.c (set_noop_p): Return nonzero in case of redundant
> vec_select
>         for overlapping register lanes.
>
> testsuite/
>         * config/gcc.dg/vect/vect-nop-move.c: New.
>
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 0cd0c7e..e1388c8 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1180,6 +1180,26 @@ set_noop_p (const_rtx set)
>        dst = SUBREG_REG (dst);
>      }
>
> +  /* It is a NOOP if destination overlaps with selected src vector
> +     elements.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> +      && HARD_REGISTER_P (XEXP (src, 0))
> +      && HARD_REGISTER_P (dst))
> +    {
> +      int i;
> +      rtx par = XEXP (src, 1);
> +      rtx src0 = XEXP (src, 0);
> +      int c0 = INTVAL (XVECEXP (par, 0, 0));
> +      HOST_WIDE_INT offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
> +
> +      for (i = 1; i < XVECLEN (par, 0); i++)
> +       if (INTVAL (XVECEXP (par, 0, i)) != c0 + i)
> +         return 0;
> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> +                                   offset, GET_MODE (dst)) == (int)REGNO
> (dst);
> +    }
> +
>    return (REG_P (src) && REG_P (dst)
>           && REGNO (src) == REGNO (dst));
>  }
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> new file mode 100644
> index 0000000..1941933
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-options "-O3 -fdump-rtl-combine-details" } */
> +
> +extern void abort (void);
> +
> +#define NOINLINE __attribute__((noinline))
> +
> +typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
> +typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
> +
> +NOINLINE float
> +foo32x4_be (float32x4_t x)
> +{
> +  return x[3];
> +}
> +
> +NOINLINE float
> +foo32x4_le (float32x4_t x)
> +{
> +  return x[0];
> +}
> +
> +NOINLINE float
> +bar (float a)
> +{
> +  return a;
> +}
> +
> +NOINLINE float
> +foo32x2_be (float32x2_t x)
> +{
> +  return bar (x[1]);
> +}
> +
> +NOINLINE float
> +foo32x2_le (float32x2_t x)
> +{
> +  return bar (x[0]);
> +}
> +
> +int
> +main()
> +{
> +  float32x4_t a = { 0.0f, 1.0f, 2.0f, 3.0f };
> +  float32x2_t b = { 0.0f, 1.0f };
> +
> +  if (foo32x4_be (a) != 3.0f)
> +    abort ();
> +
> +  if (foo32x4_le (a) != 0.0f)
> +    abort ();
> +
> +  if (foo32x2_be (b) != 1.0f)
> +    abort ();
> +
> +  if (foo32x2_le (b) != 0.0f)
> +    abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
> aarch64*-*-* } } } */

Any particular reason why it doesn't work for x86?

> +/* { dg-final { cleanup-rtl-dump "combine" } } */

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 16:14                         ` H.J. Lu
@ 2013-12-04 17:29                           ` Jeff Law
  2013-12-04 17:31                             ` H.J. Lu
  2013-12-05 22:38                           ` Jakub Jelinek
  1 sibling, 1 reply; 76+ messages in thread
From: Jeff Law @ 2013-12-04 17:29 UTC (permalink / raw)
  To: H.J. Lu, Tejas Belagod; +Cc: Bill Schmidt, gcc-patches, Richard Sandiford

On 12/04/13 09:14, H.J. Lu wrote:

>> +
>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>> aarch64*-*-* } } } */
>
> Any particular reason why it doesn't work for x86?
I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for 
the obvious reason.

jeff

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 17:29                           ` Jeff Law
@ 2013-12-04 17:31                             ` H.J. Lu
  2013-12-05 13:17                               ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-04 17:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: Tejas Belagod, Bill Schmidt, gcc-patches, Richard Sandiford

On Wed, Dec 4, 2013 at 9:29 AM, Jeff Law <law@redhat.com> wrote:
> On 12/04/13 09:14, H.J. Lu wrote:
>
>>> +
>>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>>> aarch64*-*-* } } } */
>>
>>
>> Any particular reason why it doesn't work for x86?
>
> I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for the
> obvious reason.
>

Then please add "i?86-*-* x86_64-*-*".

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 16:07                       ` Tejas Belagod
  2013-12-04 16:14                         ` H.J. Lu
@ 2013-12-04 17:36                         ` Richard Sandiford
  2013-12-04 20:04                         ` Jeff Law
  2020-04-22 16:43                         ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] (was: [Patch, RTL] Eliminate redundant vec_select moves) Thomas Schwinge
  3 siblings, 0 replies; 76+ messages in thread
From: Richard Sandiford @ 2013-12-04 17:36 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Bill Schmidt, gcc-patches

Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> The problem is that one reg rtx can span several hard registers.
>>>>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>>>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>>>>> Obviously the latter's not very likely for vectors this small,
>>>>>> but more likely for larger ones (including on NEON IIRC).
>>>>>>
>>>>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>>>>
>>>>>>    <--32--><--33-->
>>>>>>    msb          lsb
>>>>>>    0000111122223333
>>>>>>    VVVVVVVV
>>>>>>    00001111
>>>>>>    msb  lsb
>>>>>>    <--32-->
>>>>>>
>>>>>> for big endian and:
>>>>>>
>>>>>>    <--33--><--32-->
>>>>>>    msb          lsb
>>>>>>    3333222211110000
>>>>>>            VVVVVVVV
>>>>>>            11110000
>>>>>>            msb  lsb
>>>>>>            <--32-->
>>>>>>
>>>>>> for little endian.
>>>>> Ah, ok, that makes things clearer. Thanks for that.
>>>>>
>>>>> I can't find any helper function that figures out if we're writing
>>>>> partial or
>>>>> full result regs. Would something like
>>>>>
>>>>>      REGNO (src) == REGNO (dst) &&
>>>>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>>>
>>>>> be a sane check for partial result regs?
>>>> Yeah, that should work.  I think a more general alternative would be:
>>>>
>>>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>>>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
>>>>
>>>> where:
>>>>
>>>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))
>>>>
>>>> That offset is the byte offset of the first selected element from the
>>>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>>>> are counted.  For little-endian it gives the offset of the lsb of the
>>>> slice, while for big-endian it gives the offset of the msb (which is
>>>> also how SUBREG_BYTEs work).
>>>>
>>>> The simplify_subreg_regno should cope with both single-register vectors
>>>> and multi-register vectors.
>>> Sorry for the delayed response to this.
>>>
>>> Thanks for the tip. Here's an improved patch that implements the 
>>> simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
>>> test case, I failed to get the ppc back-end to generate RTL pattern
>>> that this
>>> patch checks for. I can easily write a test case for aarch64(big and little 
>>> endian) on these lines
>>>
>>> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>>>
>>> float foo_be (float32x4_t x)
>>> {
>>>    return x[3];
>>> }
>>>
>>> float foo_le (float32x4_t x)
>>> {
>>>    return x[0];
>>> }
>>>
>>> where I know that the vector indexing will generate a vec_select on
>>> the same src and dst regs that could be optimized away and hence test
>>> it. But I'm struggling to get a test case that the ppc altivec
>>> back-end will generate such a vec_select for. I see that altivec does
>>> not define vec_extract, so a simple indexing like this seems to happen
>>> via memory. Also, I don't know enough about the ppc PCS or
>>> architecture to write a test that will check for this optimization
>>> opportunity on same src and dst hard-registers. Any hints?
>> 
>> Me neither, sorry.
>> 
>> FWIW, the MIPS tests:
>> 
>>   typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
>>   void bar (float);
>>   void foo_be (float32x2_t x) { bar (x[1]); }
>>   void foo_le (float32x2_t x) { bar (x[0]); }
>> 
>> also exercise it, but I don't think they add anything over the aarch64
>> versions.  I can add them to the testsuite anyway if it helps though.
>> 
>>> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
>>> index 0cd0c7e..ca25ce5 100644
>>> --- a/gcc/rtlanal.c
>>> +++ b/gcc/rtlanal.c
>>> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>>>        dst = SUBREG_REG (dst);
>>>      }
>>>  
>>> +  /* It is a NOOP if destination overlaps with selected src vector
>>> +     elements.  */
>>> +  if (GET_CODE (src) == VEC_SELECT
>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>> +      && HARD_REGISTER_P (XEXP (src, 0))
>>> +      && HARD_REGISTER_P (dst))
>>> +    {
>>> +      rtx par = XEXP (src, 1);
>>> +      rtx src0 = XEXP (src, 0);
>>> +      HOST_WIDE_INT offset =
>>> +	GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
>>> +
>>> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
>>> +				    offset, GET_MODE (dst)) == (int)REGNO (dst);
>>> +    }
>>> +
>> 
>> Since this also (correctly) triggers for vector results, we need to keep
>> the check for consecutive indices that you had originally.  (It's always
>> the first index that should be used for the simplify_subreg_regno though.)
>> 
>> Looks good to me otherwise, thanks.
>
> Thanks Richard. Here is a revised patch. Sorry about the delay - I was
> investigating to make sure an LRA ICE I was seeing on aarch64 was
> unrelated to this patch. I've added a test case that I expect to pass
> for aarch64. I've also added the tests that you suggested for MIPS,
> but haven't checked for the target because I'm not sure what
> optimizations happen on MIPS.

Thanks, looks good to me, but I can't approve it.  Just one minor
formatting nit:

> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> +				    offset, GET_MODE (dst)) == (int)REGNO (dst);

space after "(int)".

Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 16:07                       ` Tejas Belagod
  2013-12-04 16:14                         ` H.J. Lu
  2013-12-04 17:36                         ` Richard Sandiford
@ 2013-12-04 20:04                         ` Jeff Law
  2013-12-05 16:12                           ` Tejas Belagod
  2020-04-22 16:43                         ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] (was: [Patch, RTL] Eliminate redundant vec_select moves) Thomas Schwinge
  3 siblings, 1 reply; 76+ messages in thread
From: Jeff Law @ 2013-12-04 20:04 UTC (permalink / raw)
  To: Tejas Belagod, Bill Schmidt, gcc-patches, rdsandiford

On 12/04/13 09:06, Tejas Belagod wrote:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> The problem is that one reg rtx can span several hard registers.
>>>>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>>>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>>>>> Obviously the latter's not very likely for vectors this small,
>>>>>> but more likely for larger ones (including on NEON IIRC).
>>>>>>
>>>>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>>>>
>>>>>>    <--32--><--33-->
>>>>>>    msb          lsb
>>>>>>    0000111122223333
>>>>>>    VVVVVVVV
>>>>>>    00001111
>>>>>>    msb  lsb
>>>>>>    <--32-->
>>>>>>
>>>>>> for big endian and:
>>>>>>
>>>>>>    <--33--><--32-->
>>>>>>    msb          lsb
>>>>>>    3333222211110000
>>>>>>            VVVVVVVV
>>>>>>            11110000
>>>>>>            msb  lsb
>>>>>>            <--32-->
>>>>>>
>>>>>> for little endian.
>>>>> Ah, ok, that makes things clearer. Thanks for that.
>>>>>
>>>>> I can't find any helper function that figures out if we're writing
>>>>> partial or
>>>>> full result regs. Would something like
>>>>>
>>>>>      REGNO (src) == REGNO (dst) &&
>>>>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>>>
>>>>> be a sane check for partial result regs?
>>>> Yeah, that should work.  I think a more general alternative would be:
>>>>
>>>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>>>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
>>>>
>>>> where:
>>>>
>>>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP
>>>> (sel, 0))
>>>>
>>>> That offset is the byte offset of the first selected element from the
>>>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>>>> are counted.  For little-endian it gives the offset of the lsb of the
>>>> slice, while for big-endian it gives the offset of the msb (which is
>>>> also how SUBREG_BYTEs work).
>>>>
>>>> The simplify_subreg_regno should cope with both single-register vectors
>>>> and multi-register vectors.
>>> Sorry for the delayed response to this.
>>>
>>> Thanks for the tip. Here's an improved patch that implements the
>>> simplify_sureg_regno () method of eliminating redundant moves.
>>> Regarding the test case, I failed to get the ppc back-end to generate
>>> RTL pattern that this patch checks for. I can easily write a test
>>> case for aarch64(big and little endian) on these lines
>>>
>>> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>>>
>>> float foo_be (float32x4_t x)
>>> {
>>>    return x[3];
>>> }
>>>
>>> float foo_le (float32x4_t x)
>>> {
>>>    return x[0];
>>> }
>>>
>>> where I know that the vector indexing will generate a vec_select on
>>> the same src and dst regs that could be optimized away and hence test
>>> it. But I'm struggling to get a test case that the ppc altivec
>>> back-end will generate such a vec_select for. I see that altivec does
>>> not define vec_extract, so a simple indexing like this seems to happen
>>> via memory. Also, I don't know enough about the ppc PCS or
>>> architecture to write a test that will check for this optimization
>>> opportunity on same src and dst hard-registers. Any hints?
>>
>> Me neither, sorry.
>>
>> FWIW, the MIPS tests:
>>
>>   typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
>>   void bar (float);
>>   void foo_be (float32x2_t x) { bar (x[1]); }
>>   void foo_le (float32x2_t x) { bar (x[0]); }
>>
>> also exercise it, but I don't think they add anything over the aarch64
>> versions.  I can add them to the testsuite anyway if it helps though.
>>
>>> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
>>> index 0cd0c7e..ca25ce5 100644
>>> --- a/gcc/rtlanal.c
>>> +++ b/gcc/rtlanal.c
>>> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>>>        dst = SUBREG_REG (dst);
>>>      }
>>>
>>> +  /* It is a NOOP if destination overlaps with selected src vector
>>> +     elements.  */
>>> +  if (GET_CODE (src) == VEC_SELECT
>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>> +      && HARD_REGISTER_P (XEXP (src, 0))
>>> +      && HARD_REGISTER_P (dst))
>>> +    {
>>> +      rtx par = XEXP (src, 1);
>>> +      rtx src0 = XEXP (src, 0);
>>> +      HOST_WIDE_INT offset =
>>> +    GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0,
>>> 0));
>>> +
>>> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
>>> +                    offset, GET_MODE (dst)) == (int)REGNO (dst);
>>> +    }
>>> +
>>
>> Since this also (correctly) triggers for vector results, we need to keep
>> the check for consecutive indices that you had originally.  (It's always
>> the first index that should be used for the simplify_subreg_regno
>> though.)
>>
>> Looks good to me otherwise, thanks.
>
> Thanks Richard. Here is a revised patch. Sorry about the delay - I was
> investigating to make sure an LRA ICE I was seeing on aarch64 was
> unrelated to this patch. I've added a test case that I expect to pass
> for aarch64. I've also added the tests that you suggested for MIPS, but
> haven't checked for the target because I'm not sure what optimizations
> happen on MIPS.
>
> OK for trunk?
>
> Thanks,
> Tejas.
>
> 2013-12-04  Tejas Belagod  <tejas.belagod@arm.com>
>
> gcc/
>      * rtlanal.c (set_noop_p): Return nonzero in case of redundant
> vec_select
>      for overlapping register lanes.
>
> testsuite/
>      * config/gcc.dg/vect/vect-nop-move.c: New.
Per HJ's request please test vect-nop-move on x86/x86_64 and if the 
redundant move is properly eliminated, enable the test on those targets 
(i?86-*-* x86_64-*-*).

Approved with that change for the trunk.

jeff

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 17:31                             ` H.J. Lu
@ 2013-12-05 13:17                               ` Tejas Belagod
  2013-12-05 13:30                                 ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-05 13:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jeff Law, Bill Schmidt, gcc-patches, Richard Sandiford

H.J. Lu wrote:
> On Wed, Dec 4, 2013 at 9:29 AM, Jeff Law <law@redhat.com> wrote:
>> On 12/04/13 09:14, H.J. Lu wrote:
>>
>>>> +
>>>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>>>> aarch64*-*-* } } } */
>>>
>>> Any particular reason why it doesn't work for x86?
>> I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for the
>> obvious reason.
>>
> 
> Then please add "i?86-*-* x86_64-*-*".

Hi,

I tried this test on x86_64. Though the same RTL gets generated

   (set (reg:Sf) (vec_select:SF (reg:V4Sf) (parallel [const 0]))

for -msse2, this optimization does not seem to trigger. Only later in a 
post-reload-split does it get eliminated to something like

    (set (reg:SF 21 xmm0) (reg:SF 21 xmm0))

I suspect simplify_subreg_regno () may not be returning what we want here - 
sorry, I don't know enough about x86 to debug deeper.

I could either keep this test case as is or if you could give it a quick look to 
see why it does not trigger, it would be useful to add x86 to this test.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 13:17                               ` Tejas Belagod
@ 2013-12-05 13:30                                 ` H.J. Lu
  2013-12-05 13:42                                   ` Kirill Yukhin
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-05 13:30 UTC (permalink / raw)
  To: Tejas Belagod, Yukhin, Kirill
  Cc: Jeff Law, Bill Schmidt, gcc-patches, Richard Sandiford

On Thu, Dec 5, 2013 at 5:17 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> H.J. Lu wrote:
>>
>> On Wed, Dec 4, 2013 at 9:29 AM, Jeff Law <law@redhat.com> wrote:
>>>
>>> On 12/04/13 09:14, H.J. Lu wrote:
>>>
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>>>>> aarch64*-*-* } } } */
>>>>
>>>>
>>>> Any particular reason why it doesn't work for x86?
>>>
>>> I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for
>>> the
>>> obvious reason.
>>>
>>
>> Then please add "i?86-*-* x86_64-*-*".
>
>
> Hi,
>
> I tried this test on x86_64. Though the same RTL gets generated
>
>   (set (reg:Sf) (vec_select:SF (reg:V4Sf) (parallel [const 0]))
>
> for -msse2, this optimization does not seem to trigger. Only later in a
> post-reload-split does it get eliminated to something like
>
>    (set (reg:SF 21 xmm0) (reg:SF 21 xmm0))
>
> I suspect simplify_subreg_regno () may not be returning what we want here -
> sorry, I don't know enough about x86 to debug deeper.

Kirill, can you take a look why it doesn't work for x86?

> I could either keep this test case as is or if you could give it a quick
> look to see why it does not trigger, it would be useful to add x86 to this
> test.
>
> Thanks,
> Tejas.
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 13:30                                 ` H.J. Lu
@ 2013-12-05 13:42                                   ` Kirill Yukhin
  2013-12-09  6:51                                     ` Kirill Yukhin
  0 siblings, 1 reply; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-05 13:42 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Tejas Belagod, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford

Hello,
On 05 Dec 05:30, H.J. Lu wrote:
> Kirill, can you take a look why it doesn't work for x86?
Okay, I'll look at this.

--
Thanks, K

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 20:04                         ` Jeff Law
@ 2013-12-05 16:12                           ` Tejas Belagod
  2013-12-05 16:20                             ` Jeff Law
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-05 16:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bill Schmidt, gcc-patches, rdsandiford, kirill.yukhin, H.J. Lu

Jeff Law wrote:
> On 12/04/13 09:06, Tejas Belagod wrote:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> Richard Sandiford wrote:
>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>>> The problem is that one reg rtx can span several hard registers.
>>>>>>> E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
>>>>>>> but it might instead represent two 32-bit registers (nos. 32 and 33).
>>>>>>> Obviously the latter's not very likely for vectors this small,
>>>>>>> but more likely for larger ones (including on NEON IIRC).
>>>>>>>
>>>>>>> So if we had 2 32-bit registers being treated as a V4HI, it would be:
>>>>>>>
>>>>>>>    <--32--><--33-->
>>>>>>>    msb          lsb
>>>>>>>    0000111122223333
>>>>>>>    VVVVVVVV
>>>>>>>    00001111
>>>>>>>    msb  lsb
>>>>>>>    <--32-->
>>>>>>>
>>>>>>> for big endian and:
>>>>>>>
>>>>>>>    <--33--><--32-->
>>>>>>>    msb          lsb
>>>>>>>    3333222211110000
>>>>>>>            VVVVVVVV
>>>>>>>            11110000
>>>>>>>            msb  lsb
>>>>>>>            <--32-->
>>>>>>>
>>>>>>> for little endian.
>>>>>> Ah, ok, that makes things clearer. Thanks for that.
>>>>>>
>>>>>> I can't find any helper function that figures out if we're writing
>>>>>> partial or
>>>>>> full result regs. Would something like
>>>>>>
>>>>>>      REGNO (src) == REGNO (dst) &&
>>>>>>      HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1
>>>>>>
>>>>>> be a sane check for partial result regs?
>>>>> Yeah, that should work.  I think a more general alternative would be:
>>>>>
>>>>>   simplify_subreg_regno (REGNO (src), GET_MODE (src),
>>>>>                          offset, GET_MODE (dst)) == (int) REGNO (dst)
>>>>>
>>>>> where:
>>>>>
>>>>>   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP
>>>>> (sel, 0))
>>>>>
>>>>> That offset is the byte offset of the first selected element from the
>>>>> start of a vector in memory, which is also the way that SUBREG_BYTEs
>>>>> are counted.  For little-endian it gives the offset of the lsb of the
>>>>> slice, while for big-endian it gives the offset of the msb (which is
>>>>> also how SUBREG_BYTEs work).
>>>>>
>>>>> The simplify_subreg_regno should cope with both single-register vectors
>>>>> and multi-register vectors.
>>>> Sorry for the delayed response to this.
>>>>
>>>> Thanks for the tip. Here's an improved patch that implements the
>>>> simplify_sureg_regno () method of eliminating redundant moves.
>>>> Regarding the test case, I failed to get the ppc back-end to generate
>>>> RTL pattern that this patch checks for. I can easily write a test
>>>> case for aarch64(big and little endian) on these lines
>>>>
>>>> typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
>>>>
>>>> float foo_be (float32x4_t x)
>>>> {
>>>>    return x[3];
>>>> }
>>>>
>>>> float foo_le (float32x4_t x)
>>>> {
>>>>    return x[0];
>>>> }
>>>>
>>>> where I know that the vector indexing will generate a vec_select on
>>>> the same src and dst regs that could be optimized away and hence test
>>>> it. But I'm struggling to get a test case that the ppc altivec
>>>> back-end will generate such a vec_select for. I see that altivec does
>>>> not define vec_extract, so a simple indexing like this seems to happen
>>>> via memory. Also, I don't know enough about the ppc PCS or
>>>> architecture to write a test that will check for this optimization
>>>> opportunity on same src and dst hard-registers. Any hints?
>>> Me neither, sorry.
>>>
>>> FWIW, the MIPS tests:
>>>
>>>   typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
>>>   void bar (float);
>>>   void foo_be (float32x2_t x) { bar (x[1]); }
>>>   void foo_le (float32x2_t x) { bar (x[0]); }
>>>
>>> also exercise it, but I don't think they add anything over the aarch64
>>> versions.  I can add them to the testsuite anyway if it helps though.
>>>
>>>> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
>>>> index 0cd0c7e..ca25ce5 100644
>>>> --- a/gcc/rtlanal.c
>>>> +++ b/gcc/rtlanal.c
>>>> @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
>>>>        dst = SUBREG_REG (dst);
>>>>      }
>>>>
>>>> +  /* It is a NOOP if destination overlaps with selected src vector
>>>> +     elements.  */
>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>> +      && HARD_REGISTER_P (XEXP (src, 0))
>>>> +      && HARD_REGISTER_P (dst))
>>>> +    {
>>>> +      rtx par = XEXP (src, 1);
>>>> +      rtx src0 = XEXP (src, 0);
>>>> +      HOST_WIDE_INT offset =
>>>> +    GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0,
>>>> 0));
>>>> +
>>>> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
>>>> +                    offset, GET_MODE (dst)) == (int)REGNO (dst);
>>>> +    }
>>>> +
>>> Since this also (correctly) triggers for vector results, we need to keep
>>> the check for consecutive indices that you had originally.  (It's always
>>> the first index that should be used for the simplify_subreg_regno
>>> though.)
>>>
>>> Looks good to me otherwise, thanks.
>> Thanks Richard. Here is a revised patch. Sorry about the delay - I was
>> investigating to make sure an LRA ICE I was seeing on aarch64 was
>> unrelated to this patch. I've added a test case that I expect to pass
>> for aarch64. I've also added the tests that you suggested for MIPS, but
>> haven't checked for the target because I'm not sure what optimizations
>> happen on MIPS.
>>
>> OK for trunk?
>>
>> Thanks,
>> Tejas.
>>
>> 2013-12-04  Tejas Belagod  <tejas.belagod@arm.com>
>>
>> gcc/
>>      * rtlanal.c (set_noop_p): Return nonzero in case of redundant
>> vec_select
>>      for overlapping register lanes.
>>
>> testsuite/
>>      * config/gcc.dg/vect/vect-nop-move.c: New.
> Per HJ's request please test vect-nop-move on x86/x86_64 and if the 
> redundant move is properly eliminated, enable the test on those targets 
> (i?86-*-* x86_64-*-*).
> 
> Approved with that change for the trunk.

Thanks Jeff.

Now that Kirill's looking at why this doesn't work for x86, could I check this 
in without enabling vect-nop-move.c for targets (i?86-*-* x86_64-*-*)? If not, 
I'm happy to wait.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 16:12                           ` Tejas Belagod
@ 2013-12-05 16:20                             ` Jeff Law
  0 siblings, 0 replies; 76+ messages in thread
From: Jeff Law @ 2013-12-05 16:20 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Bill Schmidt, gcc-patches, rdsandiford, kirill.yukhin, H.J. Lu

On 12/05/13 09:12, Tejas Belagod wrote:

>
> Now that Kirill's looking at why this doesn't work for x86, could I
> check this in without enabling vect-nop-move.c for targets (i?86-*-*
> x86_64-*-*)? If not, I'm happy to wait.
Yea, that's fine with me.
jeff

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-04 16:14                         ` H.J. Lu
  2013-12-04 17:29                           ` Jeff Law
@ 2013-12-05 22:38                           ` Jakub Jelinek
  2013-12-06 11:36                             ` Tejas Belagod
  2013-12-06 17:12                             ` Tejas Belagod
  1 sibling, 2 replies; 76+ messages in thread
From: Jakub Jelinek @ 2013-12-05 22:38 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: H.J. Lu, Bill Schmidt, gcc-patches, Richard Sandiford

On Wed, Dec 04, 2013 at 08:14:43AM -0800, H.J. Lu wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> > @@ -0,0 +1,64 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_float } */
> > +/* { dg-options "-O3 -fdump-rtl-combine-details" } */

Please change dg-options to dg-additional-options, otherwise
it overrides the target basic vectorization options and thus
fails on i686-linux.

> > +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
> > aarch64*-*-* } } } */
> 
> Any particular reason why it doesn't work for x86?
> 
> > +/* { dg-final { cleanup-rtl-dump "combine" } } */

You also need to add

/* { dg-final { cleanup-tree-dump "vect" } } */

because all vectorizer tests dump *.vect dumps.

	Jakub

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 22:38                           ` Jakub Jelinek
@ 2013-12-06 11:36                             ` Tejas Belagod
  2013-12-06 17:12                             ` Tejas Belagod
  1 sibling, 0 replies; 76+ messages in thread
From: Tejas Belagod @ 2013-12-06 11:36 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Bill Schmidt, gcc-patches, Richard Sandiford

Jakub Jelinek wrote:
> On Wed, Dec 04, 2013 at 08:14:43AM -0800, H.J. Lu wrote:
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
>>> @@ -0,0 +1,64 @@
>>> +/* { dg-do run } */
>>> +/* { dg-require-effective-target vect_float } */
>>> +/* { dg-options "-O3 -fdump-rtl-combine-details" } */
> 
> Please change dg-options to dg-additional-options, otherwise
> it overrides the target basic vectorization options and thus
> fails on i686-linux.

Sorry, OK I'll fix that. Also, curious to know what these fails look like - what 
default options is this overriding to cause the fail(eg. this test doesn't need 
a cost model)?

> 
>>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>>> aarch64*-*-* } } } */
>> Any particular reason why it doesn't work for x86?
>>
>>> +/* { dg-final { cleanup-rtl-dump "combine" } } */
> 
> You also need to add
> 
> /* { dg-final { cleanup-tree-dump "vect" } } */

OK, will fix.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 22:38                           ` Jakub Jelinek
  2013-12-06 11:36                             ` Tejas Belagod
@ 2013-12-06 17:12                             ` Tejas Belagod
  2013-12-06 17:20                               ` Jakub Jelinek
  1 sibling, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-06 17:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Bill Schmidt, gcc-patches, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

Jakub Jelinek wrote:
> On Wed, Dec 04, 2013 at 08:14:43AM -0800, H.J. Lu wrote:
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
>>> @@ -0,0 +1,64 @@
>>> +/* { dg-do run } */
>>> +/* { dg-require-effective-target vect_float } */
>>> +/* { dg-options "-O3 -fdump-rtl-combine-details" } */
> 
> Please change dg-options to dg-additional-options, otherwise
> it overrides the target basic vectorization options and thus
> fails on i686-linux.
> 
>>> +/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target
>>> aarch64*-*-* } } } */
>> Any particular reason why it doesn't work for x86?
>>
>>> +/* { dg-final { cleanup-rtl-dump "combine" } } */
> 
> You also need to add
> 
> /* { dg-final { cleanup-tree-dump "vect" } } */
> 
> because all vectorizer tests dump *.vect dumps.

Here is a patch, OK to commit?

Thanks,
Tejas.

2013-12-06  Tejas Belagod  <tejas.belagod@arm.com>

testsuite/
           * gcc.dg/vect/vect-nop-move.c: Fix dg options.

[-- Attachment #2: vect-nop-move.txt --]
[-- Type: text/plain, Size: 696 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
index 1941933..98f72f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */ 
 /* { dg-require-effective-target vect_float } */
-/* { dg-options "-O3 -fdump-rtl-combine-details" } */
+/* { dg-additional-options "-fdump-rtl-combine-details" } */
 
 extern void abort (void);
 
@@ -62,3 +62,4 @@ main()
 
 /* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump "combine" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-06 17:12                             ` Tejas Belagod
@ 2013-12-06 17:20                               ` Jakub Jelinek
  0 siblings, 0 replies; 76+ messages in thread
From: Jakub Jelinek @ 2013-12-06 17:20 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: H.J. Lu, Bill Schmidt, gcc-patches, Richard Sandiford

On Fri, Dec 06, 2013 at 05:12:08PM +0000, Tejas Belagod wrote:
> 2013-12-06  Tejas Belagod  <tejas.belagod@arm.com>
> 
> testsuite/
>           * gcc.dg/vect/vect-nop-move.c: Fix dg options.

Ok, thanks.

> --- a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */ 
>  /* { dg-require-effective-target vect_float } */
> -/* { dg-options "-O3 -fdump-rtl-combine-details" } */
> +/* { dg-additional-options "-fdump-rtl-combine-details" } */
>  
>  extern void abort (void);
>  
> @@ -62,3 +62,4 @@ main()
>  
>  /* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* } } } */
>  /* { dg-final { cleanup-rtl-dump "combine" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */


	Jakub

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-05 13:42                                   ` Kirill Yukhin
@ 2013-12-09  6:51                                     ` Kirill Yukhin
  2013-12-09  9:56                                       ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-09  6:51 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Tejas Belagod, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

Hello,

On 05 Dec 16:40, Kirill Yukhin wrote:
> On 05 Dec 05:30, H.J. Lu wrote:
> > Kirill, can you take a look why it doesn't work for x86?
> Okay, I'll look at this.

I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
is too conservative for x86.

In rtlanal.c we have `simplify_subreg_regno' which call target
hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
from mode, to mode and regclass.

Hook in x86 called `ix86_cannot_change_mode_class' and comment
says that we cannot change mode for nonzero offsets, which sounds
quite reasonable. That is why this hook returns `true' for this
tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
prohibits simplification of that:
  (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
       (vec_select:SF (reg:V4SF 21 xmm0 [87])
          (parallel [(const_int 0 [0])])))

I think we can extend the hook and add `offset in frommode' to it.
We may set it to -1 for the cases where it is unknown and work
conservatively in the target hook.
For most cases offset is known and we could pass it to the hook.
This will require changes throughout all targets though.

Alternatively, we may introduce another target hook, say
`CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
`CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
For x86 (and possibly other targets) we'll implement this hook, which
will checko ffset.

What do you think?

--
Thanks, K

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09  6:51                                     ` Kirill Yukhin
@ 2013-12-09  9:56                                       ` Tejas Belagod
  2013-12-09 12:01                                         ` Richard Sandiford
  2013-12-09 13:00                                         ` H.J. Lu
  0 siblings, 2 replies; 76+ messages in thread
From: Tejas Belagod @ 2013-12-09  9:56 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: H.J. Lu, Yukhin, Kirill, Jeff Law, Bill Schmidt, gcc-patches,
	Richard Sandiford, Uros Bizjak, Richard Henderson, Jakub Jelinek

Kirill Yukhin wrote:
> Hello,
> 
> On 05 Dec 16:40, Kirill Yukhin wrote:
>> On 05 Dec 05:30, H.J. Lu wrote:
>>> Kirill, can you take a look why it doesn't work for x86?
>> Okay, I'll look at this.
> 
> I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
> is too conservative for x86.
> 
> In rtlanal.c we have `simplify_subreg_regno' which call target
> hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
> from mode, to mode and regclass.
> 
> Hook in x86 called `ix86_cannot_change_mode_class' and comment
> says that we cannot change mode for nonzero offsets, which sounds
> quite reasonable. That is why this hook returns `true' for this
> tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
> prohibits simplification of that:
>   (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
>        (vec_select:SF (reg:V4SF 21 xmm0 [87])
>           (parallel [(const_int 0 [0])])))
> 
> I think we can extend the hook and add `offset in frommode' to it.
> We may set it to -1 for the cases where it is unknown and work
> conservatively in the target hook.
> For most cases offset is known and we could pass it to the hook.
> This will require changes throughout all targets though.
> 
> Alternatively, we may introduce another target hook, say
> `CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
> `CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
> For x86 (and possibly other targets) we'll implement this hook, which
> will checko ffset.
> 
> What do you think?
> 

I don't think CANNOT_CHANGE_MODE_CLASS has been designed with an intention to 
consider offsets. I thought all that magic about BYTE_OFFSET resolution into 
representable hardregs was done by subreg_get_info() where the 
info.representable is set to false if the BYTE_OFFSET of the subreg didn't map 
to a full hardreg. So if your (subreg:ymode (reg:xmode) off) maps to a full 
hardreg, simplify_subreg_regno should be returning the yregno automatically.

That's my understanding, please feel free to correct me if I'm incorrect.

Thanks,
Tejas.

> --
> Thanks, K
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09  9:56                                       ` Tejas Belagod
@ 2013-12-09 12:01                                         ` Richard Sandiford
  2013-12-09 13:00                                         ` H.J. Lu
  1 sibling, 0 replies; 76+ messages in thread
From: Richard Sandiford @ 2013-12-09 12:01 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, H.J. Lu, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Uros Bizjak, Richard Henderson, Jakub Jelinek

Tejas Belagod <tbelagod@arm.com> writes:
> Kirill Yukhin wrote:
>> Hello,
>> 
>> On 05 Dec 16:40, Kirill Yukhin wrote:
>>> On 05 Dec 05:30, H.J. Lu wrote:
>>>> Kirill, can you take a look why it doesn't work for x86?
>>> Okay, I'll look at this.
>> 
>> I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
>> is too conservative for x86.
>> 
>> In rtlanal.c we have `simplify_subreg_regno' which call target
>> hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
>> from mode, to mode and regclass.
>> 
>> Hook in x86 called `ix86_cannot_change_mode_class' and comment
>> says that we cannot change mode for nonzero offsets, which sounds
>> quite reasonable. That is why this hook returns `true' for this
>> tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
>> prohibits simplification of that:
>>   (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
>>        (vec_select:SF (reg:V4SF 21 xmm0 [87])
>>           (parallel [(const_int 0 [0])])))
>> 
>> I think we can extend the hook and add `offset in frommode' to it.
>> We may set it to -1 for the cases where it is unknown and work
>> conservatively in the target hook.
>> For most cases offset is known and we could pass it to the hook.
>> This will require changes throughout all targets though.
>> 
>> Alternatively, we may introduce another target hook, say
>> `CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
>> `CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
>> For x86 (and possibly other targets) we'll implement this hook, which
>> will checko ffset.
>> 
>> What do you think?
>> 
>
> I don't think CANNOT_CHANGE_MODE_CLASS has been designed with an
> intention to consider offsets. I thought all that magic about
> BYTE_OFFSET resolution into representable hardregs was done by
> subreg_get_info() where the info.representable is set to false if the
> BYTE_OFFSET of the subreg didn't map to a full hardreg. So if your
> (subreg:ymode (reg:xmode) off) maps to a full hardreg,
> simplify_subreg_regno should be returning the yregno automatically.

I agree.  A subreg only reduces to a single hard register if the subreg
logically refers to the low part of the hard register.  That's a target-
independent requirement so the hook shouldn't need to worry about it.

I'm just speculating, but maybe the problem is that this was traditionally
keyed off word size.  If a subreg is smaller than a word then it must
correspond to the low part of the containing word.  So if words
are 32 bits or wider, things like (subreg:QI (reg:SI X) 1) and
(subreg:QI (reg:SI X) 2) are always invalid, even for pseudo Xs.
But that doesn't stop things like (subreg:QI (reg:DI X) 4) on 32-bit
little-endian targets.  So we can run into trouble when dealing with
wider-than-word registers, since whether the byte offset is representable
depends on the class.  And things like IRA would need this to be trapped
at the class level, rather than just for specific hard registers.

If that was the problem though, it still sounds like something that could
be handled in a target-independent way, via things like class_max_nregs.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09  9:56                                       ` Tejas Belagod
  2013-12-09 12:01                                         ` Richard Sandiford
@ 2013-12-09 13:00                                         ` H.J. Lu
  2013-12-09 13:49                                           ` H.J. Lu
  1 sibling, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-09 13:00 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

On Mon, Dec 9, 2013 at 1:56 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> Kirill Yukhin wrote:
>>
>> Hello,
>>
>> On 05 Dec 16:40, Kirill Yukhin wrote:
>>>
>>> On 05 Dec 05:30, H.J. Lu wrote:
>>>>
>>>> Kirill, can you take a look why it doesn't work for x86?
>>>
>>> Okay, I'll look at this.
>>
>>
>> I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
>> is too conservative for x86.
>>
>> In rtlanal.c we have `simplify_subreg_regno' which call target
>> hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
>> from mode, to mode and regclass.
>>
>> Hook in x86 called `ix86_cannot_change_mode_class' and comment
>> says that we cannot change mode for nonzero offsets, which sounds
>> quite reasonable. That is why this hook returns `true' for this
>> tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
>> prohibits simplification of that:
>>   (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
>>        (vec_select:SF (reg:V4SF 21 xmm0 [87])
>>           (parallel [(const_int 0 [0])])))
>>
>> I think we can extend the hook and add `offset in frommode' to it.
>> We may set it to -1 for the cases where it is unknown and work
>> conservatively in the target hook.
>> For most cases offset is known and we could pass it to the hook.
>> This will require changes throughout all targets though.
>>
>> Alternatively, we may introduce another target hook, say
>> `CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
>> `CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
>> For x86 (and possibly other targets) we'll implement this hook, which
>> will checko ffset.
>>
>> What do you think?
>>
>
> I don't think CANNOT_CHANGE_MODE_CLASS has been designed with an intention
> to consider offsets. I thought all that magic about BYTE_OFFSET resolution
> into representable hardregs was done by subreg_get_info() where the
> info.representable is set to false if the BYTE_OFFSET of the subreg didn't
> map to a full hardreg. So if your (subreg:ymode (reg:xmode) off) maps to a
> full hardreg, simplify_subreg_regno should be returning the yregno
> automatically.
>
> That's my understanding, please feel free to correct me if I'm incorrect.

Kirill, ARM doesn't define HARD_REGNO_NREGS_HAS_PADDING
and i386 does.  Could that be the cause?

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09 13:00                                         ` H.J. Lu
@ 2013-12-09 13:49                                           ` H.J. Lu
  2013-12-09 22:08                                             ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-09 13:49 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

On Mon, Dec 9, 2013 at 5:00 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Dec 9, 2013 at 1:56 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>> Kirill Yukhin wrote:
>>>
>>> Hello,
>>>
>>> On 05 Dec 16:40, Kirill Yukhin wrote:
>>>>
>>>> On 05 Dec 05:30, H.J. Lu wrote:
>>>>>
>>>>> Kirill, can you take a look why it doesn't work for x86?
>>>>
>>>> Okay, I'll look at this.
>>>
>>>
>>> I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
>>> is too conservative for x86.
>>>
>>> In rtlanal.c we have `simplify_subreg_regno' which call target
>>> hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
>>> from mode, to mode and regclass.
>>>
>>> Hook in x86 called `ix86_cannot_change_mode_class' and comment
>>> says that we cannot change mode for nonzero offsets, which sounds
>>> quite reasonable. That is why this hook returns `true' for this
>>> tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
>>> prohibits simplification of that:
>>>   (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
>>>        (vec_select:SF (reg:V4SF 21 xmm0 [87])
>>>           (parallel [(const_int 0 [0])])))
>>>
>>> I think we can extend the hook and add `offset in frommode' to it.
>>> We may set it to -1 for the cases where it is unknown and work
>>> conservatively in the target hook.
>>> For most cases offset is known and we could pass it to the hook.
>>> This will require changes throughout all targets though.
>>>
>>> Alternatively, we may introduce another target hook, say
>>> `CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
>>> `CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
>>> For x86 (and possibly other targets) we'll implement this hook, which
>>> will checko ffset.
>>>
>>> What do you think?
>>>
>>
>> I don't think CANNOT_CHANGE_MODE_CLASS has been designed with an intention
>> to consider offsets. I thought all that magic about BYTE_OFFSET resolution
>> into representable hardregs was done by subreg_get_info() where the
>> info.representable is set to false if the BYTE_OFFSET of the subreg didn't
>> map to a full hardreg. So if your (subreg:ymode (reg:xmode) off) maps to a
>> full hardreg, simplify_subreg_regno should be returning the yregno
>> automatically.
>>
>> That's my understanding, please feel free to correct me if I'm incorrect.
>
> Kirill, ARM doesn't define HARD_REGNO_NREGS_HAS_PADDING
> and i386 does.  Could that be the cause?
>

Nevermind. I don't think it does.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09 13:49                                           ` H.J. Lu
@ 2013-12-09 22:08                                             ` H.J. Lu
  2013-12-10 14:53                                               ` Kirill Yukhin
  2013-12-10 16:07                                               ` Kirill Yukhin
  0 siblings, 2 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-09 22:08 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2862 bytes --]

On Mon, Dec 9, 2013 at 5:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Dec 9, 2013 at 5:00 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Dec 9, 2013 at 1:56 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>>> Kirill Yukhin wrote:
>>>>
>>>> Hello,
>>>>
>>>> On 05 Dec 16:40, Kirill Yukhin wrote:
>>>>>
>>>>> On 05 Dec 05:30, H.J. Lu wrote:
>>>>>>
>>>>>> Kirill, can you take a look why it doesn't work for x86?
>>>>>
>>>>> Okay, I'll look at this.
>>>>
>>>>
>>>> I've looked at this. It seems that `CANNOT_CHANGE_MODE_CLASS'
>>>> is too conservative for x86.
>>>>
>>>> In rtlanal.c we have `simplify_subreg_regno' which call target
>>>> hook `REG_CANNOT_CHANGE_MODE_P'. It takes only 3 arguments:
>>>> from mode, to mode and regclass.
>>>>
>>>> Hook in x86 called `ix86_cannot_change_mode_class' and comment
>>>> says that we cannot change mode for nonzero offsets, which sounds
>>>> quite reasonable. That is why this hook returns `true' for this
>>>> tuple <V4SF, SF, FIRST_SSE_REG> and `simplify_subreg_regno'
>>>> prohibits simplification of that:
>>>>   (set (reg:SF 21 xmm0 [orig:86 D.1816 ] [86])
>>>>        (vec_select:SF (reg:V4SF 21 xmm0 [87])
>>>>           (parallel [(const_int 0 [0])])))
>>>>
>>>> I think we can extend the hook and add `offset in frommode' to it.
>>>> We may set it to -1 for the cases where it is unknown and work
>>>> conservatively in the target hook.
>>>> For most cases offset is known and we could pass it to the hook.
>>>> This will require changes throughout all targets though.
>>>>
>>>> Alternatively, we may introduce another target hook, say
>>>> `CANNOT_CHANGE_MODE_CLASS_OFFSET' with same args as
>>>> `CANNOT_CHANGE_MODE_CLASS' + offset and which will be defaulted to it.
>>>> For x86 (and possibly other targets) we'll implement this hook, which
>>>> will checko ffset.
>>>>
>>>> What do you think?
>>>>
>>>
>>> I don't think CANNOT_CHANGE_MODE_CLASS has been designed with an intention
>>> to consider offsets. I thought all that magic about BYTE_OFFSET resolution
>>> into representable hardregs was done by subreg_get_info() where the
>>> info.representable is set to false if the BYTE_OFFSET of the subreg didn't
>>> map to a full hardreg. So if your (subreg:ymode (reg:xmode) off) maps to a
>>> full hardreg, simplify_subreg_regno should be returning the yregno
>>> automatically.
>>>
>>> That's my understanding, please feel free to correct me if I'm incorrect.
>>
>> Kirill, ARM doesn't define HARD_REGNO_NREGS_HAS_PADDING
>> and i386 does.  Could that be the cause?
>>
>
> Nevermind. I don't think it does.
>
>

Hi Kirll,

Here is a patch to add offset to CANNOT_CHANGE_MODE_CLASS.
I used MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT offset
to indicate any offsets.

There are no regressions on Linux/x86-64 with -m32 and -m64.
Can you check if it improves code quality on x886?

Thanks.


-- 
H.J.

[-- Attachment #2: 0001-Add-offset-to-CANNOT_CHANGE_MODE_CLASS.patch --]
[-- Type: text/plain, Size: 22485 bytes --]

From 9311df02699b5e8b101d5de07496492129522812 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 9 Dec 2013 10:46:24 -0800
Subject: [PATCH] Add offset to CANNOT_CHANGE_MODE_CLASS

---
 gcc/combine.c                 |  2 ++
 gcc/config/aarch64/aarch64.h  |  6 +++---
 gcc/config/alpha/alpha.h      |  2 +-
 gcc/config/arm/arm.h          |  8 ++++----
 gcc/config/i386/i386-protos.h |  4 +++-
 gcc/config/i386/i386.c        | 12 ++++++------
 gcc/config/i386/i386.h        |  7 ++++---
 gcc/config/i386/i386.md       |  1 +
 gcc/config/ia64/ia64.h        |  2 +-
 gcc/config/m32c/m32c.h        |  2 +-
 gcc/config/mep/mep.h          |  2 +-
 gcc/config/mips/mips.h        |  2 +-
 gcc/config/msp430/msp430.h    | 10 +++++-----
 gcc/config/pa/pa32-regs.h     |  2 +-
 gcc/config/pa/pa64-regs.h     |  2 +-
 gcc/config/pdp11/pdp11.h      |  2 +-
 gcc/config/rs6000/rs6000.h    |  2 +-
 gcc/config/s390/s390.h        |  2 +-
 gcc/config/score/score.h      |  4 ++--
 gcc/config/sh/sh.h            |  2 +-
 gcc/config/sparc/sparc.h      |  2 +-
 gcc/config/spu/spu.h          |  2 +-
 gcc/emit-rtl.c                |  2 +-
 gcc/hard-reg-set.h            |  6 +++---
 gcc/postreload.c              |  4 ++++
 gcc/recog.c                   |  3 ++-
 gcc/regcprop.c                |  4 +++-
 gcc/reginfo.c                 |  1 +
 gcc/reload.c                  | 10 +++++++---
 gcc/reload1.c                 | 10 +++++++---
 gcc/rtlanal.c                 |  2 +-
 31 files changed, 72 insertions(+), 50 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index c7eb5e5..4575b16 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5084,6 +5084,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		      && REGNO (to) < FIRST_PSEUDO_REGISTER
 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
 						   GET_MODE (to),
+						   SUBREG_BYTE (x),
 						   GET_MODE (x)))
 		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 #endif
@@ -6450,6 +6451,7 @@ simplify_set (rtx x)
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
+					 SUBREG_BYTE (src),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..5b3bead 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -820,9 +820,9 @@ do {									     \
 
 /*  VFP registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
-   ? reg_classes_intersect_p (FP_REGS, (CLASS))		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
+   ? reg_classes_intersect_p (FP_REGS, (CLASS))			\
    : 0)
 
 
diff --git a/gcc/config/alpha/alpha.h b/gcc/config/alpha/alpha.h
index 2e7c078..fbdcb2d 100644
--- a/gcc/config/alpha/alpha.h
+++ b/gcc/config/alpha/alpha.h
@@ -541,7 +541,7 @@ enum reg_class {
 
 /* Return the class of registers that cannot change mode from FROM to TO.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
    ? reg_classes_intersect_p (FLOAT_REGS, CLASS) : 0)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b8b80e..18341af 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1247,10 +1247,10 @@ enum reg_class
    In big-endian mode, modes greater than word size (i.e. DFmode) are stored in
    VFP registers in little-endian order.  We can't describe that accurately to
    GCC, so avoid taking subregs of such values.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (TARGET_VFP && TARGET_BIG_END				\
-   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD		\
-       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
+  (TARGET_VFP && TARGET_BIG_END					\
+   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD			\
+       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)			\
    && reg_classes_intersect_p (VFP_REGS, (CLASS)))
 
 /* The class value for index registers, and the one for base regs.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 73feef2..0cbb9ae 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -167,7 +167,9 @@ extern bool ix86_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
-					   enum machine_mode, enum reg_class);
+					   unsigned int,
+					   enum machine_mode,
+					   enum reg_class);
 
 extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5dde632..f193b43 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35000,10 +35000,12 @@ ix86_class_max_nregs (reg_class_t rclass, enum machine_mode mode)
 }
 
 /* Return true if the registers in CLASS cannot represent the change from
-   modes FROM to TO.  */
+   modes FROM at offset OFFSET to TO.  */
 
 bool
-ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
+ix86_cannot_change_mode_class (enum machine_mode from,
+			       unsigned int offset,
+			       enum machine_mode to,
 			       enum reg_class regclass)
 {
   if (from == to)
@@ -35024,10 +35026,8 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
 	return true;
 
       /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
+	 are otherwise valid for integer registers.  */
+      if (offset != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
 	return true;
     }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7efd1e0..1efd9f4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1522,10 +1522,11 @@ enum reg_class
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
-/* Return a class of registers that cannot change FROM mode to TO mode.  */
+/* Return a class of registers that cannot change FROM mode to TO mode
+   with OFFSET.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
-  ix86_cannot_change_mode_class (FROM, TO, CLASS)
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
+  ix86_cannot_change_mode_class (FROM, OFFSET, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7138868..c461e36 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3095,6 +3095,7 @@
 	{
 	case MODE_DI:
 	  return "movq\t{%1, %0|%0, %1}";
+	case MODE_SF:
 	case MODE_SI:
 	  return "movd\t{%1, %0|%0, %1}";
 
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ae9027c..05455af 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -856,7 +856,7 @@ enum reg_class
    In FP regs, we can't change FP values to integer values and vice versa,
    but we can change e.g. DImode to SImode, and V2SFmode into DImode.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) 	\
   (reg_classes_intersect_p (CLASS, BR_REGS)			\
    ? (FROM) != (TO)						\
    : (SCALAR_FLOAT_MODE_P (FROM) != SCALAR_FLOAT_MODE_P (TO)	\
diff --git a/gcc/config/m32c/m32c.h b/gcc/config/m32c/m32c.h
index 3ceb093..497a743 100644
--- a/gcc/config/m32c/m32c.h
+++ b/gcc/config/m32c/m32c.h
@@ -415,7 +415,7 @@ enum reg_class
 
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
 
-#define CANNOT_CHANGE_MODE_CLASS(F,T,C) m32c_cannot_change_mode_class(F,T,C)
+#define CANNOT_CHANGE_MODE_CLASS(F,O,T,C) m32c_cannot_change_mode_class(F,T,C)
 
 /* STACK AND CALLING */
 
diff --git a/gcc/config/mep/mep.h b/gcc/config/mep/mep.h
index 023d73c..4beac52 100644
--- a/gcc/config/mep/mep.h
+++ b/gcc/config/mep/mep.h
@@ -321,7 +321,7 @@ extern char mep_leaf_registers[];
 
 #define MODES_TIEABLE_P(MODE1, MODE2) 1
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   mep_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 enum reg_class
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 021419c..003ee12 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2104,7 +2104,7 @@ enum reg_class
 
 #define CLASS_MAX_NREGS(CLASS, MODE) mips_class_max_nregs (CLASS, MODE)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   mips_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 953c638..441bc21 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -394,11 +394,11 @@ typedef struct
   ((TARGET_LARGE && ((NREGS) <= 2)) ? PSImode : choose_hard_reg_mode ((REGNO), (NREGS), false))
 
 /* Also stop GCC from thinking that it can eliminate (SUBREG:PSI (SI)).  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM,TO,CLASS) \
-  (   ((TO) == PSImode && (FROM) == SImode)	\
-   || ((TO) == SImode  && (FROM) == PSImode)    \
-   || ((TO) == DImode  && (FROM) == PSImode)    \
-   || ((TO) == PSImode && (FROM) == DImode)     \
+#define CANNOT_CHANGE_MODE_CLASS(FROM,OFFSET,TO,CLASS) \
+  (   ((TO) == PSImode && (FROM) == SImode)	       \
+   || ((TO) == SImode  && (FROM) == PSImode)           \
+   || ((TO) == DImode  && (FROM) == PSImode)           \
+   || ((TO) == PSImode && (FROM) == DImode)            \
       )
 
 #define ACCUMULATE_OUTGOING_ARGS 1
diff --git a/gcc/config/pa/pa32-regs.h b/gcc/config/pa/pa32-regs.h
index 098e9ba..e053978 100644
--- a/gcc/config/pa/pa32-regs.h
+++ b/gcc/config/pa/pa32-regs.h
@@ -296,7 +296,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pa/pa64-regs.h b/gcc/config/pa/pa64-regs.h
index 002520a..df6ca4d 100644
--- a/gcc/config/pa/pa64-regs.h
+++ b/gcc/config/pa/pa64-regs.h
@@ -232,7 +232,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index d4bc19a..492bf36 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -282,7 +282,7 @@ enum reg_class { NO_REGS, MUL_REGS, GENERAL_REGS, LOAD_FPU_REGS, NO_LOAD_FPU_REG
   1									\
 )
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pdp11_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb59235..4807d63 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1505,7 +1505,7 @@ extern enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 
 /* Return nonzero if for CLASS a mode change from FROM to TO is invalid.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)			\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)		\
   rs6000_cannot_change_mode_class_ptr (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index bca18fe..e38ca1f 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -419,7 +419,7 @@ enum processor_flags
    cannot use SUBREGs to switch between modes in FP registers.
    Likewise for access registers, since they have only half the
    word size on 64-bit.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	        \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			        \
    ? ((reg_classes_intersect_p (FP_REGS, CLASS)				\
        && (GET_MODE_SIZE (FROM) < 8 || GET_MODE_SIZE (TO) < 8))		\
diff --git a/gcc/config/score/score.h b/gcc/config/score/score.h
index ca73401..8df1056 100644
--- a/gcc/config/score/score.h
+++ b/gcc/config/score/score.h
@@ -414,8 +414,8 @@ enum reg_class
 #define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X) \
   score_secondary_reload_class (CLASS, MODE, X)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)    \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)    \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)                \
    ? reg_classes_intersect_p (HI_REG, (CLASS)) : 0)
 
 
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 9f07012..b35ce58 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1149,7 +1149,7 @@ extern enum reg_class regno_reg_class[FIRST_PSEUDO_REGISTER];
    operand of a SUBREG that changes the mode of the object illegally.
    ??? We need to renumber the internal numbers for the frnn registers
    when in little endian in order to allow mode size changes.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   sh_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 7533e88..bddd29b 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -912,7 +912,7 @@ extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER];
    Likewise for SFmode, since word-mode paradoxical subregs are
    problematic on big-endian architectures.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
   (TARGET_ARCH64						\
    && GET_MODE_SIZE (FROM) == 4					\
    && GET_MODE_SIZE (TO) != 4					\
diff --git a/gcc/config/spu/spu.h b/gcc/config/spu/spu.h
index 64a2ba0..0e77250 100644
--- a/gcc/config/spu/spu.h
+++ b/gcc/config/spu/spu.h
@@ -226,7 +226,7 @@ enum reg_class {
 
 /* GCC assumes that modes are in the lowpart of a register, which is
    only true for SPU. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
         ((GET_MODE_SIZE (FROM) > 4 || GET_MODE_SIZE (TO) > 4) \
 	 && (GET_MODE_SIZE (FROM) < 16 || GET_MODE_SIZE (TO) < 16) \
 	 && GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index d7fa3a5..b8e3dfd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -748,7 +748,7 @@ validate_subreg (enum machine_mode omode, enum machine_mode imode,
       if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	  && GET_MODE_INNER (imode) == omode)
 	;
-      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, omode))
+      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, offset, omode))
 	return false;
 #endif
 
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index 09a09c5..5140339 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -716,9 +716,9 @@ extern struct target_hard_regs *this_target_hard_regs;
 
 extern const char * reg_class_names[];
 
-/* Given a hard REGN a FROM mode and a TO mode, return nonzero if
+/* Given a hard REGN a FROM mode at OFFSET and a TO mode, return nonzero if
    REGN cannot change modes between the specified modes.  */
-#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, TO)                          \
-         CANNOT_CHANGE_MODE_CLASS (FROM, TO, REGNO_REG_CLASS (REGN))
+#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, OFFSET, TO)                  \
+         CANNOT_CHANGE_MODE_CLASS (FROM, OFFSET, TO, REGNO_REG_CLASS (REGN))
 
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/postreload.c b/gcc/postreload.c
index b0c6342..6ecb7c9 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -349,6 +349,8 @@ reload_cse_simplify_set (rtx set, rtx insn)
 	      && extend_op != UNKNOWN
 #ifdef CANNOT_CHANGE_MODE_CLASS
 	      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+					    (GET_CODE (SET_DEST (set)) == SUBREG
+					     ? SUBREG_BYTE (SET_DEST (set)) : 0),
 					    word_mode,
 					    REGNO_REG_CLASS (REGNO (SET_DEST (set))))
 #endif
@@ -459,6 +461,8 @@ reload_cse_simplify_operands (rtx insn, rtx testreg)
 	     it cannot have been used in word_mode.  */
 	  else if (REG_P (SET_DEST (set))
 		   && CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+						(GET_CODE (SET_DEST (set)) == SUBREG
+						 ? SUBREG_BYTE (SET_DEST (set)) : 0),
 						word_mode,
 						REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
 	    ; /* Continue ordinary processing.  */
diff --git a/gcc/recog.c b/gcc/recog.c
index 7f59756..85e13d3 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1069,7 +1069,8 @@ register_operand (rtx op, enum machine_mode mode)
 #ifdef CANNOT_CHANGE_MODE_CLASS
       if (REG_P (sub)
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
-	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
+	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub),
+				       SUBREG_BYTE (op), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
 	  /* LRA can generate some invalid SUBREGS just for matched
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 9b52a63..8afcc5e 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -389,7 +389,9 @@ mode_change_ok (enum machine_mode orig_mode, enum machine_mode new_mode,
     return false;
 
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode, new_mode);
+  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode,
+				    (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				    new_mode);
 #endif
 
   return true;
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index db66a09..5dd652d 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1222,6 +1222,7 @@ record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
 	if (!bitmap_bit_p (invalid_mode_changes,
 			   regno * N_REG_CLASSES + rclass)
 	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
+					 (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
 					 mode, (enum reg_class) rclass))
 	  bitmap_set_bit (invalid_mode_changes,
 			  regno * N_REG_CLASSES + rclass);
diff --git a/gcc/reload.c b/gcc/reload.c
index 96619f6..487d4d4 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1064,7 +1064,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (in != 0 && GET_CODE (in) == SUBREG
       && (subreg_lowpart_p (in) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)),
+				    SUBREG_BYTE (in), inmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
       && (CONSTANT_P (SUBREG_REG (in))
@@ -1113,7 +1114,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P
-	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)), inmode))
+	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)),
+	       SUBREG_BYTE (in), inmode))
 #endif
 	  ))
     {
@@ -1174,7 +1176,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (out != 0 && GET_CODE (out) == SUBREG
       && (subreg_lowpart_p (out) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)),
+				    SUBREG_BYTE (out), outmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
       && (CONSTANT_P (SUBREG_REG (out))
@@ -1209,6 +1212,7 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P (REGNO (SUBREG_REG (out)),
 					   GET_MODE (SUBREG_REG (out)),
+					   SUBREG_BYTE (out),
 					   outmode))
 #endif
 	  ))
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 6864ec1..17b2c61 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6609,7 +6609,7 @@ choose_reload_regs (struct insn_chain *chain)
 		     mode MODE.  */
 		  && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
 						GET_MODE (reg_last_reload_reg[regno]),
-						mode)
+						byte, mode)
 #endif
 		  )
 		{
@@ -8080,8 +8080,12 @@ inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
 		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
-	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode,
+				     (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				     reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode,
+					(MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					reg_raw_mode[src]));
 #else
   return true;
 #endif
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 38f9e36..9687110 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3533,7 +3533,7 @@ simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, offset, ymode)
       /* We can use mode change in LRA for some transformations.  */
       && ! lra_in_progress)
     return -1;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09 22:08                                             ` H.J. Lu
@ 2013-12-10 14:53                                               ` Kirill Yukhin
  2013-12-10 16:52                                                 ` Paul_Koning
  2013-12-10 16:07                                               ` Kirill Yukhin
  1 sibling, 1 reply; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-10 14:53 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Tejas Belagod, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

Hello,
On 09 Dec 14:08, H.J. Lu wrote:
> There are no regressions on Linux/x86-64 with -m32 and -m64.
> Can you check if it improves code quality on x886?

That is exactly what I was talking about. However I wasn't sure
that we can change already defined (and used throughout ports)
target hook.

Anyway, this patch is not working for given test, because combine
of these insns is blocked:
(insn 2 4 3 2 (set (reg/v:V4SF 85 [ x ])
        (reg:V4SF 21 xmm0 [ x ]))
     (expr_list:REG_DEAD (reg:V4SF 21 xmm0 [ x ])
        (nil)))

(insn 6 3 11 2 (set (reg:SF 86 [ D.1819 ])
        (vec_select:SF (reg/v:V4SF 85 [ x ])
            (parallel [
                    (const_int 0 [0])
                ])))
     (expr_list:REG_DEAD (reg/v:V4SF 85 [ x ])
        (nil)))

(insn 11 6 14 2 (set (reg/i:SF 21 xmm0)
        (reg:SF 86 [ D.1819 ]))
     (expr_list:REG_DEAD (reg:SF 86 [ D.1819 ])
        (nil)))

This is because XMM0 is SSE_FIRST_REG which is likely_spilled_p.
Which I suspect is correct, since it is return value register.
Anyway, we may change the test so, that VEC_SELECT won't contain
XMM0 and will be successfully combined with input and output,
resulting to this pattern:
(insn 9 8 10 2 (set (reg:SF 22 xmm1)
        (vec_select:SF (reg:V4SF 22 xmm1 [ y ])
            (parallel [
                    (const_int 0 [0])
                ])))
     (nil))
Which is noop-erased with the patch.

Attached patch + updated test.

Note. It still not working for 32-bit x86 because of different
paramter passing. I cannot invent solution :)
For x86_64 all workks fine.

Note2. Even without the patch such VEC_SELECT are removed during
split2 pass, due to such split (sse.md):
(define_split
  [(set (match_operand:DF 0 "register_operand")
        (vec_select:DF
          (match_operand:V2DF 1 "nonimmediate_operand")
          (parallel [(const_int 0)])))]
  "TARGET_SSE2 && reload_completed"
  [(set (match_dup 0) (match_dup 1))]

But I believe earlier we git rid of redundant code is better.

--
Thanks, K

---
 gcc/combine.c                             |  2 ++
 gcc/config/aarch64/aarch64.h              |  6 +++---
 gcc/config/alpha/alpha.h                  |  2 +-
 gcc/config/arm/arm.h                      |  8 ++++----
 gcc/config/i386/i386-protos.h             |  4 +++-
 gcc/config/i386/i386.c                    | 12 ++++++------
 gcc/config/i386/i386.h                    |  7 ++++---
 gcc/config/i386/i386.md                   |  1 +
 gcc/config/ia64/ia64.h                    |  2 +-
 gcc/config/m32c/m32c.h                    |  2 +-
 gcc/config/mep/mep.h                      |  2 +-
 gcc/config/mips/mips.h                    |  2 +-
 gcc/config/msp430/msp430.h                | 10 +++++-----
 gcc/config/pa/pa32-regs.h                 |  2 +-
 gcc/config/pa/pa64-regs.h                 |  2 +-
 gcc/config/pdp11/pdp11.h                  |  2 +-
 gcc/config/rs6000/rs6000.h                |  2 +-
 gcc/config/s390/s390.h                    |  2 +-
 gcc/config/score/score.h                  |  4 ++--
 gcc/config/sh/sh.h                        |  2 +-
 gcc/config/sparc/sparc.h                  |  2 +-
 gcc/config/spu/spu.h                      |  2 +-
 gcc/emit-rtl.c                            |  2 +-
 gcc/hard-reg-set.h                        |  6 +++---
 gcc/postreload.c                          |  4 ++++
 gcc/recog.c                               |  3 ++-
 gcc/regcprop.c                            |  4 +++-
 gcc/reginfo.c                             |  1 +
 gcc/reload.c                              | 10 +++++++---
 gcc/reload1.c                             | 10 +++++++---
 gcc/rtlanal.c                             |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-nop-move.c | 22 +++++++++++++++++-----
 32 files changed, 89 insertions(+), 55 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index c7eb5e5..4575b16 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5084,6 +5084,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		      && REGNO (to) < FIRST_PSEUDO_REGISTER
 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
 						   GET_MODE (to),
+						   SUBREG_BYTE (x),
 						   GET_MODE (x)))
 		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 #endif
@@ -6450,6 +6451,7 @@ simplify_set (rtx x)
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
+					 SUBREG_BYTE (src),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..5b3bead 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -820,9 +820,9 @@ do {									     \
 
 /*  VFP registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
-   ? reg_classes_intersect_p (FP_REGS, (CLASS))		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
+   ? reg_classes_intersect_p (FP_REGS, (CLASS))			\
    : 0)
 
 
diff --git a/gcc/config/alpha/alpha.h b/gcc/config/alpha/alpha.h
index 2e7c078..fbdcb2d 100644
--- a/gcc/config/alpha/alpha.h
+++ b/gcc/config/alpha/alpha.h
@@ -541,7 +541,7 @@ enum reg_class {
 
 /* Return the class of registers that cannot change mode from FROM to TO.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
    ? reg_classes_intersect_p (FLOAT_REGS, CLASS) : 0)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b8b80e..18341af 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1247,10 +1247,10 @@ enum reg_class
    In big-endian mode, modes greater than word size (i.e. DFmode) are stored in
    VFP registers in little-endian order.  We can't describe that accurately to
    GCC, so avoid taking subregs of such values.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (TARGET_VFP && TARGET_BIG_END				\
-   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD		\
-       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
+  (TARGET_VFP && TARGET_BIG_END					\
+   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD			\
+       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)			\
    && reg_classes_intersect_p (VFP_REGS, (CLASS)))
 
 /* The class value for index registers, and the one for base regs.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 73feef2..0cbb9ae 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -167,7 +167,9 @@ extern bool ix86_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
-					   enum machine_mode, enum reg_class);
+					   unsigned int,
+					   enum machine_mode,
+					   enum reg_class);
 
 extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 382f8fb..3e5332d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34988,10 +34988,12 @@ ix86_class_max_nregs (reg_class_t rclass, enum machine_mode mode)
 }
 
 /* Return true if the registers in CLASS cannot represent the change from
-   modes FROM to TO.  */
+   modes FROM at offset OFFSET to TO.  */
 
 bool
-ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
+ix86_cannot_change_mode_class (enum machine_mode from,
+			       unsigned int offset,
+			       enum machine_mode to,
 			       enum reg_class regclass)
 {
   if (from == to)
@@ -35012,10 +35014,8 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
 	return true;
 
       /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
+	 are otherwise valid for integer registers.  */
+      if (offset != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
 	return true;
     }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index db81aea..692fbcf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1522,10 +1522,11 @@ enum reg_class
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
-/* Return a class of registers that cannot change FROM mode to TO mode.  */
+/* Return a class of registers that cannot change FROM mode to TO mode
+   with OFFSET.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
-  ix86_cannot_change_mode_class (FROM, TO, CLASS)
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
+  ix86_cannot_change_mode_class (FROM, OFFSET, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7138868..c461e36 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3095,6 +3095,7 @@
 	{
 	case MODE_DI:
 	  return "movq\t{%1, %0|%0, %1}";
+	case MODE_SF:
 	case MODE_SI:
 	  return "movd\t{%1, %0|%0, %1}";
 
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ae9027c..05455af 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -856,7 +856,7 @@ enum reg_class
    In FP regs, we can't change FP values to integer values and vice versa,
    but we can change e.g. DImode to SImode, and V2SFmode into DImode.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) 	\
   (reg_classes_intersect_p (CLASS, BR_REGS)			\
    ? (FROM) != (TO)						\
    : (SCALAR_FLOAT_MODE_P (FROM) != SCALAR_FLOAT_MODE_P (TO)	\
diff --git a/gcc/config/m32c/m32c.h b/gcc/config/m32c/m32c.h
index 3ceb093..497a743 100644
--- a/gcc/config/m32c/m32c.h
+++ b/gcc/config/m32c/m32c.h
@@ -415,7 +415,7 @@ enum reg_class
 
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
 
-#define CANNOT_CHANGE_MODE_CLASS(F,T,C) m32c_cannot_change_mode_class(F,T,C)
+#define CANNOT_CHANGE_MODE_CLASS(F,O,T,C) m32c_cannot_change_mode_class(F,T,C)
 
 /* STACK AND CALLING */
 
diff --git a/gcc/config/mep/mep.h b/gcc/config/mep/mep.h
index 023d73c..4beac52 100644
--- a/gcc/config/mep/mep.h
+++ b/gcc/config/mep/mep.h
@@ -321,7 +321,7 @@ extern char mep_leaf_registers[];
 
 #define MODES_TIEABLE_P(MODE1, MODE2) 1
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   mep_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 enum reg_class
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 021419c..003ee12 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2104,7 +2104,7 @@ enum reg_class
 
 #define CLASS_MAX_NREGS(CLASS, MODE) mips_class_max_nregs (CLASS, MODE)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   mips_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 953c638..441bc21 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -394,11 +394,11 @@ typedef struct
   ((TARGET_LARGE && ((NREGS) <= 2)) ? PSImode : choose_hard_reg_mode ((REGNO), (NREGS), false))
 
 /* Also stop GCC from thinking that it can eliminate (SUBREG:PSI (SI)).  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM,TO,CLASS) \
-  (   ((TO) == PSImode && (FROM) == SImode)	\
-   || ((TO) == SImode  && (FROM) == PSImode)    \
-   || ((TO) == DImode  && (FROM) == PSImode)    \
-   || ((TO) == PSImode && (FROM) == DImode)     \
+#define CANNOT_CHANGE_MODE_CLASS(FROM,OFFSET,TO,CLASS) \
+  (   ((TO) == PSImode && (FROM) == SImode)	       \
+   || ((TO) == SImode  && (FROM) == PSImode)           \
+   || ((TO) == DImode  && (FROM) == PSImode)           \
+   || ((TO) == PSImode && (FROM) == DImode)            \
       )
 
 #define ACCUMULATE_OUTGOING_ARGS 1
diff --git a/gcc/config/pa/pa32-regs.h b/gcc/config/pa/pa32-regs.h
index 098e9ba..e053978 100644
--- a/gcc/config/pa/pa32-regs.h
+++ b/gcc/config/pa/pa32-regs.h
@@ -296,7 +296,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pa/pa64-regs.h b/gcc/config/pa/pa64-regs.h
index 002520a..df6ca4d 100644
--- a/gcc/config/pa/pa64-regs.h
+++ b/gcc/config/pa/pa64-regs.h
@@ -232,7 +232,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index d4bc19a..492bf36 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -282,7 +282,7 @@ enum reg_class { NO_REGS, MUL_REGS, GENERAL_REGS, LOAD_FPU_REGS, NO_LOAD_FPU_REG
   1									\
 )
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   pdp11_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb59235..4807d63 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1505,7 +1505,7 @@ extern enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 
 /* Return nonzero if for CLASS a mode change from FROM to TO is invalid.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)			\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)		\
   rs6000_cannot_change_mode_class_ptr (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index bca18fe..e38ca1f 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -419,7 +419,7 @@ enum processor_flags
    cannot use SUBREGs to switch between modes in FP registers.
    Likewise for access registers, since they have only half the
    word size on 64-bit.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	        \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			        \
    ? ((reg_classes_intersect_p (FP_REGS, CLASS)				\
        && (GET_MODE_SIZE (FROM) < 8 || GET_MODE_SIZE (TO) < 8))		\
diff --git a/gcc/config/score/score.h b/gcc/config/score/score.h
index ca73401..8df1056 100644
--- a/gcc/config/score/score.h
+++ b/gcc/config/score/score.h
@@ -414,8 +414,8 @@ enum reg_class
 #define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X) \
   score_secondary_reload_class (CLASS, MODE, X)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)    \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)    \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)                \
    ? reg_classes_intersect_p (HI_REG, (CLASS)) : 0)
 
 
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 9f07012..b35ce58 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1149,7 +1149,7 @@ extern enum reg_class regno_reg_class[FIRST_PSEUDO_REGISTER];
    operand of a SUBREG that changes the mode of the object illegally.
    ??? We need to renumber the internal numbers for the frnn registers
    when in little endian in order to allow mode size changes.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
   sh_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index d96c1b6..40e1e59 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -912,7 +912,7 @@ extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER];
    Likewise for SFmode, since word-mode paradoxical subregs are
    problematic on big-endian architectures.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS)	\
   (TARGET_ARCH64						\
    && GET_MODE_SIZE (FROM) == 4					\
    && GET_MODE_SIZE (TO) != 4					\
diff --git a/gcc/config/spu/spu.h b/gcc/config/spu/spu.h
index 64a2ba0..0e77250 100644
--- a/gcc/config/spu/spu.h
+++ b/gcc/config/spu/spu.h
@@ -226,7 +226,7 @@ enum reg_class {
 
 /* GCC assumes that modes are in the lowpart of a register, which is
    only true for SPU. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, OFFSET, TO, CLASS) \
         ((GET_MODE_SIZE (FROM) > 4 || GET_MODE_SIZE (TO) > 4) \
 	 && (GET_MODE_SIZE (FROM) < 16 || GET_MODE_SIZE (TO) < 16) \
 	 && GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index d7fa3a5..b8e3dfd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -748,7 +748,7 @@ validate_subreg (enum machine_mode omode, enum machine_mode imode,
       if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	  && GET_MODE_INNER (imode) == omode)
 	;
-      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, omode))
+      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, offset, omode))
 	return false;
 #endif
 
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index 09a09c5..5140339 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -716,9 +716,9 @@ extern struct target_hard_regs *this_target_hard_regs;
 
 extern const char * reg_class_names[];
 
-/* Given a hard REGN a FROM mode and a TO mode, return nonzero if
+/* Given a hard REGN a FROM mode at OFFSET and a TO mode, return nonzero if
    REGN cannot change modes between the specified modes.  */
-#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, TO)                          \
-         CANNOT_CHANGE_MODE_CLASS (FROM, TO, REGNO_REG_CLASS (REGN))
+#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, OFFSET, TO)                  \
+         CANNOT_CHANGE_MODE_CLASS (FROM, OFFSET, TO, REGNO_REG_CLASS (REGN))
 
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/postreload.c b/gcc/postreload.c
index b0c6342..6ecb7c9 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -349,6 +349,8 @@ reload_cse_simplify_set (rtx set, rtx insn)
 	      && extend_op != UNKNOWN
 #ifdef CANNOT_CHANGE_MODE_CLASS
 	      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+					    (GET_CODE (SET_DEST (set)) == SUBREG
+					     ? SUBREG_BYTE (SET_DEST (set)) : 0),
 					    word_mode,
 					    REGNO_REG_CLASS (REGNO (SET_DEST (set))))
 #endif
@@ -459,6 +461,8 @@ reload_cse_simplify_operands (rtx insn, rtx testreg)
 	     it cannot have been used in word_mode.  */
 	  else if (REG_P (SET_DEST (set))
 		   && CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+						(GET_CODE (SET_DEST (set)) == SUBREG
+						 ? SUBREG_BYTE (SET_DEST (set)) : 0),
 						word_mode,
 						REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
 	    ; /* Continue ordinary processing.  */
diff --git a/gcc/recog.c b/gcc/recog.c
index 7f59756..85e13d3 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1069,7 +1069,8 @@ register_operand (rtx op, enum machine_mode mode)
 #ifdef CANNOT_CHANGE_MODE_CLASS
       if (REG_P (sub)
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
-	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
+	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub),
+				       SUBREG_BYTE (op), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
 	  /* LRA can generate some invalid SUBREGS just for matched
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 9b52a63..8afcc5e 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -389,7 +389,9 @@ mode_change_ok (enum machine_mode orig_mode, enum machine_mode new_mode,
     return false;
 
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode, new_mode);
+  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode,
+				    (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				    new_mode);
 #endif
 
   return true;
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index db66a09..5dd652d 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1222,6 +1222,7 @@ record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
 	if (!bitmap_bit_p (invalid_mode_changes,
 			   regno * N_REG_CLASSES + rclass)
 	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
+					 (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
 					 mode, (enum reg_class) rclass))
 	  bitmap_set_bit (invalid_mode_changes,
 			  regno * N_REG_CLASSES + rclass);
diff --git a/gcc/reload.c b/gcc/reload.c
index 96619f6..487d4d4 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1064,7 +1064,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (in != 0 && GET_CODE (in) == SUBREG
       && (subreg_lowpart_p (in) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)),
+				    SUBREG_BYTE (in), inmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
       && (CONSTANT_P (SUBREG_REG (in))
@@ -1113,7 +1114,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P
-	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)), inmode))
+	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)),
+	       SUBREG_BYTE (in), inmode))
 #endif
 	  ))
     {
@@ -1174,7 +1176,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (out != 0 && GET_CODE (out) == SUBREG
       && (subreg_lowpart_p (out) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)),
+				    SUBREG_BYTE (out), outmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
       && (CONSTANT_P (SUBREG_REG (out))
@@ -1209,6 +1212,7 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P (REGNO (SUBREG_REG (out)),
 					   GET_MODE (SUBREG_REG (out)),
+					   SUBREG_BYTE (out),
 					   outmode))
 #endif
 	  ))
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 6864ec1..17b2c61 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6609,7 +6609,7 @@ choose_reload_regs (struct insn_chain *chain)
 		     mode MODE.  */
 		  && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
 						GET_MODE (reg_last_reload_reg[regno]),
-						mode)
+						byte, mode)
 #endif
 		  )
 		{
@@ -8080,8 +8080,12 @@ inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
 		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
-	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode,
+				     (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				     reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode,
+					(MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					reg_raw_mode[src]));
 #else
   return true;
 #endif
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 38f9e36..9687110 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3533,7 +3533,7 @@ simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, offset, ymode)
       /* We can use mode change in LRA for some transformations.  */
       && ! lra_in_progress)
     return -1;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
index 1941933..76c07bd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */ 
+/* { dg-do run } */
 /* { dg-require-effective-target vect_float } */
 /* { dg-options "-O3 -fdump-rtl-combine-details" } */
 
@@ -16,9 +16,15 @@ foo32x4_be (float32x4_t x)
 }
 
 NOINLINE float
-foo32x4_le (float32x4_t x)
+bar_2 (float a, float b)
 {
-  return x[0];
+  return a;
+}
+
+NOINLINE float
+foo32x4_le (float32x4_t x, float32x4_t y)
+{
+  return bar_2 (x[0], y[0]);
 }
 
 NOINLINE float
@@ -30,12 +36,18 @@ bar (float a)
 NOINLINE float
 foo32x2_be (float32x2_t x)
 {
+#ifdef __i386__
+  __builtin_ia32_emms ();
+#endif
   return bar (x[1]);
 }
 
 NOINLINE float
 foo32x2_le (float32x2_t x)
 {
+#ifdef __i386__
+  __builtin_ia32_emms ();
+#endif
   return bar (x[0]);
 }
 
@@ -48,7 +60,7 @@ main()
   if (foo32x4_be (a) != 3.0f)
     abort ();
 
-  if (foo32x4_le (a) != 0.0f)
+  if (foo32x4_le (a, a) != 0.0f)
     abort ();
 
   if (foo32x2_be (b) != 1.0f)
@@ -60,5 +72,5 @@ main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* } } } */
+/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* x86_64-*-*} } } */
 /* { dg-final { cleanup-rtl-dump "combine" } } */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-09 22:08                                             ` H.J. Lu
  2013-12-10 14:53                                               ` Kirill Yukhin
@ 2013-12-10 16:07                                               ` Kirill Yukhin
  2013-12-10 16:24                                                 ` H.J. Lu
  2013-12-10 17:02                                                 ` [Patch, RTL] Eliminate redundant vec_select moves H.J. Lu
  1 sibling, 2 replies; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-10 16:07 UTC (permalink / raw)
  To: H.J. Lu, rth
  Cc: Tejas Belagod, Yukhin, Kirill, Jeff Law, Bill Schmidt,
	gcc-patches, Richard Sandiford, Uros Bizjak, Richard Henderson,
	Jakub Jelinek

On 09 Dec 14:08, H.J. Lu wrote:
> 
> There are no regressions on Linux/x86-64 with -m32 and -m64.
> Can you check if it improves code quality on x886?

As second thought. If Tejas and Richard are right and it is simply incorrect
to check any offsets in this hook, may be we can end up with patch in the
bottom?

Test is passing (however I still don't know how to prohibit it for 32 bit x86),
bootstrap in progress.

Ideas?

This change belongs to rth.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 382f8fb..0d0bb67 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35002,22 +35002,13 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
   if (MAYBE_FLOAT_CLASS_P (regclass))
     return true;
 
-  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
-    {
-      /* Vector registers do not support QI or HImode loads.  If we don't
-	 disallow a change to these modes, reload will assume it's ok to
-	 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
-	 the vec_dupv4hi pattern.  */
-      if (GET_MODE_SIZE (from) < 4)
-	return true;
-
-      /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
-	return true;
-    }
+  /* Vector registers do not support QI or HImode loads.  If we don't
+     disallow a change to these modes, reload will assume it's ok to
+     drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
+     the vec_dupv4hi pattern.  */
+  if ((MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
+      && (GET_MODE_SIZE (from) < 4))
+    return true;
 
   return false;
 }
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
index 1941933..e863c1b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */ 
+/* { dg-do run } */
 /* { dg-require-effective-target vect_float } */
 /* { dg-options "-O3 -fdump-rtl-combine-details" } */
 
@@ -16,9 +16,15 @@ foo32x4_be (float32x4_t x)
 }
 
 NOINLINE float
-foo32x4_le (float32x4_t x)
+bar_2 (float a, float b)
 {
-  return x[0];
+  return a;
+}
+
+NOINLINE float
+foo32x4_le (float32x4_t x, float32x4_t y)
+{
+  return bar_2 (x[0], y[0]);
 }
 
 NOINLINE float
@@ -30,12 +36,18 @@ bar (float a)
 NOINLINE float
 foo32x2_be (float32x2_t x)
 {
+#ifdef __i386__
+  __builtin_ia32_emms ();
+#endif
   return bar (x[1]);
 }
 
 NOINLINE float
 foo32x2_le (float32x2_t x)
 {
+#ifdef __i386__
+  __builtin_ia32_emms ();
+#endif
   return bar (x[0]);
 }
 
@@ -48,7 +60,7 @@ main()
   if (foo32x4_be (a) != 3.0f)
     abort ();
 
-  if (foo32x4_le (a) != 0.0f)
+  if (foo32x4_le (a, a) != 0.0f)
     abort ();
 
   if (foo32x2_be (b) != 1.0f)
@@ -60,5 +72,5 @@ main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target aarch64*-*-* } } } */
+/* { dg-final { scan-rtl-dump "deleting noop move" "combine" { target { aarch64*-*-* || x86_64-*-* } } } } */
 /* { dg-final { cleanup-rtl-dump "combine" } } */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 16:07                                               ` Kirill Yukhin
@ 2013-12-10 16:24                                                 ` H.J. Lu
  2013-12-10 17:07                                                   ` Kirill Yukhin
  2013-12-10 17:57                                                   ` Richard Sandiford
  2013-12-10 17:02                                                 ` [Patch, RTL] Eliminate redundant vec_select moves H.J. Lu
  1 sibling, 2 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 16:24 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> On 09 Dec 14:08, H.J. Lu wrote:
>>
>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>> Can you check if it improves code quality on x886?
>
> As second thought. If Tejas and Richard are right and it is simply incorrect
> to check any offsets in this hook, may be we can end up with patch in the
> bottom?

What is wrong to pass the correct offset to
CANNOT_CHANGE_MODE_CLASS?  Backends are free to
ignore it.

>
> Test is passing (however I still don't know how to prohibit it for 32 bit x86),
> bootstrap in progress.
>
> Ideas?
>
> This change belongs to rth.
>
> --
> Thanks, K
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 382f8fb..0d0bb67 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -35002,22 +35002,13 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
>    if (MAYBE_FLOAT_CLASS_P (regclass))
>      return true;
>
> -  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> -    {
> -      /* Vector registers do not support QI or HImode loads.  If we don't
> -        disallow a change to these modes, reload will assume it's ok to
> -        drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
> -        the vec_dupv4hi pattern.  */
> -      if (GET_MODE_SIZE (from) < 4)
> -       return true;
> -
> -      /* Vector registers do not support subreg with nonzero offsets, which
> -        are otherwise valid for integer registers.  Since we can't see
> -        whether we have a nonzero offset from here, prohibit all
> -         nonparadoxical subregs changing size.  */
> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
> -       return true;
> -    }
> +  /* Vector registers do not support QI or HImode loads.  If we don't
> +     disallow a change to these modes, reload will assume it's ok to
> +     drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
> +     the vec_dupv4hi pattern.  */
> +  if ((MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> +      && (GET_MODE_SIZE (from) < 4))
> +    return true;
>
>

You need to run full "make check" for both -m32 and -m64.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 14:53                                               ` Kirill Yukhin
@ 2013-12-10 16:52                                                 ` Paul_Koning
  0 siblings, 0 replies; 76+ messages in thread
From: Paul_Koning @ 2013-12-10 16:52 UTC (permalink / raw)
  To: kirill.yukhin; +Cc: gcc-patches


On Dec 10, 2013, at 9:50 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:

> Hello,
> On 09 Dec 14:08, H.J. Lu wrote:
>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>> Can you check if it improves code quality on x886?
> 
> That is exactly what I was talking about. However I wasn't sure
> that we can change already defined (and used throughout ports)
> target hook.
> 
> ...
> 
> Attached patch + updated test.

You're missing the documentation change needed for this.

	paul

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 16:07                                               ` Kirill Yukhin
  2013-12-10 16:24                                                 ` H.J. Lu
@ 2013-12-10 17:02                                                 ` H.J. Lu
  2013-12-10 17:11                                                   ` Kirill Yukhin
  1 sibling, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 17:02 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> On 09 Dec 14:08, H.J. Lu wrote:
>>
>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>> Can you check if it improves code quality on x886?
>
> As second thought. If Tejas and Richard are right and it is simply incorrect
> to check any offsets in this hook, may be we can end up with patch in the
> bottom?
>
> Test is passing (however I still don't know how to prohibit it for 32 bit x86),
> bootstrap in progress.
>
> Ideas?
>
> This change belongs to rth.


>
>  NOINLINE float
>  foo32x2_le (float32x2_t x)
>  {
> +#ifdef __i386__
> +  __builtin_ia32_emms ();
> +#endif
>    return bar (x[0]);
>  }
>

You should check both __i386__ and __x86_64__.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 16:24                                                 ` H.J. Lu
@ 2013-12-10 17:07                                                   ` Kirill Yukhin
  2013-12-10 17:14                                                     ` H.J. Lu
  2013-12-10 17:57                                                   ` Richard Sandiford
  1 sibling, 1 reply; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-10 17:07 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On 10 Dec 08:23, H.J. Lu wrote:
> What is wrong to pass the correct offset to
> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
> ignore it.

Yes, but as fas as understand this hook as a predicate
saying if it not-safe to change mode1 to mode2 for given
register class. I don't think that offsets should be
involved here. IMHO it is safe to change V4SF->SF
for SSE/AVX register...

--
Thanks, K

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:02                                                 ` [Patch, RTL] Eliminate redundant vec_select moves H.J. Lu
@ 2013-12-10 17:11                                                   ` Kirill Yukhin
  2013-12-10 17:12                                                     ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Kirill Yukhin @ 2013-12-10 17:11 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On 10 Dec 09:02, H.J. Lu wrote:
> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> > On 09 Dec 14:08, H.J. Lu wrote:
> >  NOINLINE float
> >  foo32x2_le (float32x2_t x)
> >  {
> > +#ifdef __i386__
> > +  __builtin_ia32_emms ();
> > +#endif
> >    return bar (x[0]);
> >  }
> You should check both __i386__ and __x86_64__.
Why? I thought that we pass using MMX only in 32-bit mode.
This built-in is useless on 64-bit x86.

K

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:11                                                   ` Kirill Yukhin
@ 2013-12-10 17:12                                                     ` H.J. Lu
  0 siblings, 0 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 17:12 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 9:09 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> On 10 Dec 09:02, H.J. Lu wrote:
>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> > On 09 Dec 14:08, H.J. Lu wrote:
>> >  NOINLINE float
>> >  foo32x2_le (float32x2_t x)
>> >  {
>> > +#ifdef __i386__
>> > +  __builtin_ia32_emms ();
>> > +#endif
>> >    return bar (x[0]);
>> >  }
>> You should check both __i386__ and __x86_64__.
> Why? I thought that we pass using MMX only in 32-bit mode.
> This built-in is useless on 64-bit x86.
>

Can you double check it?

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:07                                                   ` Kirill Yukhin
@ 2013-12-10 17:14                                                     ` H.J. Lu
  2013-12-10 17:26                                                       ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 17:14 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Richard Henderson, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 9:04 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> On 10 Dec 08:23, H.J. Lu wrote:
>> What is wrong to pass the correct offset to
>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>> ignore it.
>
> Yes, but as fas as understand this hook as a predicate
> saying if it not-safe to change mode1 to mode2 for given

In many places, the macro is used with the known offset.
I have a follow up patch which improves x86 code generation,
in cases like:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45198

> register class. I don't think that offsets should be
> involved here. IMHO it is safe to change V4SF->SF
> for SSE/AVX register...


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:14                                                     ` H.J. Lu
@ 2013-12-10 17:26                                                       ` Tejas Belagod
  2013-12-10 17:39                                                         ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-10 17:26 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill Yukhin, Richard Henderson, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

H.J. Lu wrote:
> On Tue, Dec 10, 2013 at 9:04 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> On 10 Dec 08:23, H.J. Lu wrote:
>>> What is wrong to pass the correct offset to
>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>> ignore it.
>> Yes, but as fas as understand this hook as a predicate
>> saying if it not-safe to change mode1 to mode2 for given
> 
> In many places, the macro is used with the known offset.
> I have a follow up patch which improves x86 code generation,
> in cases like:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45198
> 

What is it that subreg_get_info () can't resolve that CANNOT_CHANGE_MODE_CLASS 
with an offset can?

Thanks,
Tejas.

>> register class. I don't think that offsets should be
>> involved here. IMHO it is safe to change V4SF->SF
>> for SSE/AVX register...
> 
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:26                                                       ` Tejas Belagod
@ 2013-12-10 17:39                                                         ` H.J. Lu
  2013-12-10 19:05                                                           ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 17:39 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, Richard Henderson, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 9:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> H.J. Lu wrote:
>>
>> On Tue, Dec 10, 2013 at 9:04 AM, Kirill Yukhin <kirill.yukhin@gmail.com>
>> wrote:
>>>
>>> On 10 Dec 08:23, H.J. Lu wrote:
>>>>
>>>> What is wrong to pass the correct offset to
>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>> ignore it.
>>>
>>> Yes, but as fas as understand this hook as a predicate
>>> saying if it not-safe to change mode1 to mode2 for given
>>
>>
>> In many places, the macro is used with the known offset.
>> I have a follow up patch which improves x86 code generation,
>> in cases like:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45198
>>
>
> What is it that subreg_get_info () can't resolve that
> CANNOT_CHANGE_MODE_CLASS with an offset can?
>

We have

int
simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
                       unsigned int offset, enum machine_mode ymode)
{
  struct subreg_info info;
  unsigned int yregno;

#ifdef CANNOT_CHANGE_MODE_CLASS
  /* Give the backend a chance to disallow the mode change.  */
  if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
      && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
      /* We can use mode change in LRA for some transformations.  */
      && ! lra_in_progress)
    return -1;
#endif

CANNOT_CHANGE_MODE_CLASS is checked before subreg_get_info is
called.  When REG_CANNOT_CHANGE_MODE_P returns false,
there is nothing subreg_get_info can do.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 16:24                                                 ` H.J. Lu
  2013-12-10 17:07                                                   ` Kirill Yukhin
@ 2013-12-10 17:57                                                   ` Richard Sandiford
  2013-12-10 18:21                                                     ` H.J. Lu
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-12-10 17:57 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> On 09 Dec 14:08, H.J. Lu wrote:
>>>
>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>> Can you check if it improves code quality on x886?
>>
>> As second thought. If Tejas and Richard are right and it is simply incorrect
>> to check any offsets in this hook, may be we can end up with patch in the
>> bottom?
>
> What is wrong to pass the correct offset to
> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
> ignore it.

The point is that:

>> -      /* Vector registers do not support subreg with nonzero offsets, which
>> -        are otherwise valid for integer registers.  Since we can't see
>> -        whether we have a nonzero offset from here, prohibit all
>> -         nonparadoxical subregs changing size.  */
>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>> -       return true;

seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
which is always invalid for a single-register V4SF.  See:

    http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00824.html

for the longer version.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:57                                                   ` Richard Sandiford
@ 2013-12-10 18:21                                                     ` H.J. Lu
  2013-12-10 18:26                                                       ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 18:21 UTC (permalink / raw)
  To: H.J. Lu, Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin,
	Kirill, Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak,
	Jakub Jelinek, Richard Sandiford

On Tue, Dec 10, 2013 at 9:57 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>>> On 09 Dec 14:08, H.J. Lu wrote:
>>>>
>>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>>> Can you check if it improves code quality on x886?
>>>
>>> As second thought. If Tejas and Richard are right and it is simply incorrect
>>> to check any offsets in this hook, may be we can end up with patch in the
>>> bottom?
>>
>> What is wrong to pass the correct offset to
>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>> ignore it.
>
> The point is that:
>
>>> -      /* Vector registers do not support subreg with nonzero offsets, which
>>> -        are otherwise valid for integer registers.  Since we can't see
>>> -        whether we have a nonzero offset from here, prohibit all
>>> -         nonparadoxical subregs changing size.  */
>>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>>> -       return true;
>
> seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
> which is always invalid for a single-register V4SF.  See:

That is correct.

>     http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00824.html
>
> for the longer version.

In all places where CANNOT_CHANGE_MODE_CLASS is used,
only mode_change_ok, record_subregs_of_mode and inherit_piecemeal_p
don't have the known subreg offset.  In all other places, we know
what exactly the subreg offset is. When the subreg offset is passed
to CANNOT_CHANGE_MODE_CLASS, a backend can have

(subreg:DI (match_operand:V4SF 1 "register_operand" "x,x") 0)

in patterns.  I pushed hjl/subreg branch to GCC git repo with
a new pattern:


(define_insn "*mov<VMOVE:mode><VMOVE_SWI48:mode>_subreg"
 [(set (match_operand:VMOVE_SWI48 0 "nonimmediate_operand"           "=rxm")
       (subreg:VMOVE_SWI48 (match_operand:VMOVE 1 "register_operand" "x") 0))]
 ""
{
#if 1
  /* Help check where the subreg pattern is used.  */
  debug_rtx (insn);
  abort ();
#else
  /* Handle broken assemblers that require movd instead of movq.  */
  if (<VMOVE_SWI48:MODE>mode == SImode
      || (!HAVE_AS_IX86_INTERUNIT_MOVQ
          && (GENERAL_REG_P (operands[0]))))
    return "%vmovd\t{%x1, %0|%0, %x1}";
  return "%vmovq\t{%x1, %0|%0, %1x}";
#endif
}
  [(set_attr "type" "ssemov")
   (set_attr "prefix" "maybe_vex")
   (set_attr "mode" "<VMOVE_SWI48:MODE>")])

I ran GCC testsuites with all languages enabled.  This pattern
is triggered 1178 times.  I checked a few of them.  The new patten
leads to reg-reg move instead of mem-reg load.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 18:21                                                     ` H.J. Lu
@ 2013-12-10 18:26                                                       ` Richard Sandiford
  2013-12-10 18:33                                                         ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-12-10 18:26 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Tue, Dec 10, 2013 at 9:57 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin
>>> <kirill.yukhin@gmail.com> wrote:
>>>> On 09 Dec 14:08, H.J. Lu wrote:
>>>>>
>>>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>>>> Can you check if it improves code quality on x886?
>>>>
>>>> As second thought. If Tejas and Richard are right and it is simply incorrect
>>>> to check any offsets in this hook, may be we can end up with patch in the
>>>> bottom?
>>>
>>> What is wrong to pass the correct offset to
>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>> ignore it.
>>
>> The point is that:
>>
>>>> -      /* Vector registers do not support subreg with nonzero offsets, which
>>>> -        are otherwise valid for integer registers.  Since we can't see
>>>> -        whether we have a nonzero offset from here, prohibit all
>>>> -         nonparadoxical subregs changing size.  */
>>>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>>>> -       return true;
>>
>> seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
>> which is always invalid for a single-register V4SF.  See:
>
> That is correct.

Sorry, what I mean is: that subreg is always invalid for single-
register V4SFs regardless of the target.  This isn't something that
CANNOT_CHANGE_MODE_CLASS should be expected to check.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 18:26                                                       ` Richard Sandiford
@ 2013-12-10 18:33                                                         ` H.J. Lu
  2013-12-10 18:45                                                           ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 18:33 UTC (permalink / raw)
  To: H.J. Lu, Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin,
	Kirill, Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak,
	Jakub Jelinek, Richard Sandiford

On Tue, Dec 10, 2013 at 10:26 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Tue, Dec 10, 2013 at 9:57 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin
>>>> <kirill.yukhin@gmail.com> wrote:
>>>>> On 09 Dec 14:08, H.J. Lu wrote:
>>>>>>
>>>>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>>>>> Can you check if it improves code quality on x886?
>>>>>
>>>>> As second thought. If Tejas and Richard are right and it is simply incorrect
>>>>> to check any offsets in this hook, may be we can end up with patch in the
>>>>> bottom?
>>>>
>>>> What is wrong to pass the correct offset to
>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>> ignore it.
>>>
>>> The point is that:
>>>
>>>>> -      /* Vector registers do not support subreg with nonzero offsets, which
>>>>> -        are otherwise valid for integer registers.  Since we can't see
>>>>> -        whether we have a nonzero offset from here, prohibit all
>>>>> -         nonparadoxical subregs changing size.  */
>>>>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>>>>> -       return true;
>>>
>>> seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
>>> which is always invalid for a single-register V4SF.  See:
>>
>> That is correct.
>
> Sorry, what I mean is: that subreg is always invalid for single-
> register V4SFs regardless of the target.  This isn't something that
> CANNOT_CHANGE_MODE_CLASS should be expected to check.
>

Why is

(define_insn "*movv4sfdi_subreg"
 [(set (match_operand:DI 0 "nonimmediate_operand"           "=rxm")
       (subreg:DI (match_operand:V4SF 1 "register_operand" "x") 0))]

invalid?  It can be used in libffi.call/struct8.c.  hjl/subreg branch
generates

movq    %xmm0, %xmm0    # 39    *movv4sfdi_subreg    [length = 4]

instead of

movq    -56(%rsp), %xmm0    # 44    *movdi_internal/15    [length = 7]

both clears upper 64 bits in %xmm0.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 18:33                                                         ` H.J. Lu
@ 2013-12-10 18:45                                                           ` Richard Sandiford
  2013-12-10 18:46                                                             ` H.J. Lu
  2013-12-10 20:40                                                             ` Richard Henderson
  0 siblings, 2 replies; 76+ messages in thread
From: Richard Sandiford @ 2013-12-10 18:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Tue, Dec 10, 2013 at 10:26 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>> On Tue, Dec 10, 2013 at 9:57 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin
>>>>> <kirill.yukhin@gmail.com> wrote:
>>>>>> On 09 Dec 14:08, H.J. Lu wrote:
>>>>>>>
>>>>>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>>>>>> Can you check if it improves code quality on x886?
>>>>>>
>>>>>> As second thought. If Tejas and Richard are right and it is simply
>>>>>> incorrect
>>>>>> to check any offsets in this hook, may be we can end up with patch in the
>>>>>> bottom?
>>>>>
>>>>> What is wrong to pass the correct offset to
>>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>>> ignore it.
>>>>
>>>> The point is that:
>>>>
>>>>>> - /* Vector registers do not support subreg with nonzero offsets,
>>>>>> which
>>>>>> -        are otherwise valid for integer registers.  Since we can't see
>>>>>> -        whether we have a nonzero offset from here, prohibit all
>>>>>> -         nonparadoxical subregs changing size.  */
>>>>>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>>>>>> -       return true;
>>>>
>>>> seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
>>>> which is always invalid for a single-register V4SF.  See:
>>>
>>> That is correct.
>>
>> Sorry, what I mean is: that subreg is always invalid for single-
>> register V4SFs regardless of the target.  This isn't something that
>> CANNOT_CHANGE_MODE_CLASS should be expected to check.
>>
>
> Why is
>
> (define_insn "*movv4sfdi_subreg"
>  [(set (match_operand:DI 0 "nonimmediate_operand"           "=rxm")
>        (subreg:DI (match_operand:V4SF 1 "register_operand" "x") 0))]
>
> invalid?

Sorry, I don't understand.  I never said it was invalid.  I said
(subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
a single register.  On a little-endian target, the offset cannot be
anything other than 0 in that case.

So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
something that is always invalid, regardless of the target.  That kind
of situation should be rejected by target-independent code instead.

In other words I'm arguing against the idea of passing the offset to
CANNOT_CHANGE_MODE_CLASS (which you seemed to be supporting in the
quote above).  I think Kirill's patch to remove the i386.c check was
the right way to go.

There's no need for a separate insn though.  Once you allow the subregs
(as per Kirill's patch), the normal move patterns will handle them.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 18:45                                                           ` Richard Sandiford
@ 2013-12-10 18:46                                                             ` H.J. Lu
  2013-12-10 20:40                                                             ` Richard Henderson
  1 sibling, 0 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 18:46 UTC (permalink / raw)
  To: Kirill Yukhin, Richard Henderson, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Tue, Dec 10, 2013 at 10:44 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Tue, Dec 10, 2013 at 10:26 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>> On Tue, Dec 10, 2013 at 9:57 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>>> On Tue, Dec 10, 2013 at 8:05 AM, Kirill Yukhin
>>>>>> <kirill.yukhin@gmail.com> wrote:
>>>>>>> On 09 Dec 14:08, H.J. Lu wrote:
>>>>>>>>
>>>>>>>> There are no regressions on Linux/x86-64 with -m32 and -m64.
>>>>>>>> Can you check if it improves code quality on x886?
>>>>>>>
>>>>>>> As second thought. If Tejas and Richard are right and it is simply
>>>>>>> incorrect
>>>>>>> to check any offsets in this hook, may be we can end up with patch in the
>>>>>>> bottom?
>>>>>>
>>>>>> What is wrong to pass the correct offset to
>>>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>>>> ignore it.
>>>>>
>>>>> The point is that:
>>>>>
>>>>>>> - /* Vector registers do not support subreg with nonzero offsets,
>>>>>>> which
>>>>>>> -        are otherwise valid for integer registers.  Since we can't see
>>>>>>> -        whether we have a nonzero offset from here, prohibit all
>>>>>>> -         nonparadoxical subregs changing size.  */
>>>>>>> -      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>>>>>>> -       return true;
>>>>>
>>>>> seems to be trying to reject things like (subreg:SF (reg:V4SF X) 1),
>>>>> which is always invalid for a single-register V4SF.  See:
>>>>
>>>> That is correct.
>>>
>>> Sorry, what I mean is: that subreg is always invalid for single-
>>> register V4SFs regardless of the target.  This isn't something that
>>> CANNOT_CHANGE_MODE_CLASS should be expected to check.
>>>
>>
>> Why is
>>
>> (define_insn "*movv4sfdi_subreg"
>>  [(set (match_operand:DI 0 "nonimmediate_operand"           "=rxm")
>>        (subreg:DI (match_operand:V4SF 1 "register_operand" "x") 0))]
>>
>> invalid?
>
> Sorry, I don't understand.  I never said it was invalid.  I said
> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
> a single register.  On a little-endian target, the offset cannot be
> anything other than 0 in that case.
>
> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
> something that is always invalid, regardless of the target.  That kind
> of situation should be rejected by target-independent code instead.
>
> In other words I'm arguing against the idea of passing the offset to
> CANNOT_CHANGE_MODE_CLASS (which you seemed to be supporting in the
> quote above).  I think Kirill's patch to remove the i386.c check was
> the right way to go.
>
> There's no need for a separate insn though.  Once you allow the subregs
> (as per Kirill's patch), the normal move patterns will handle them.
>

We will wait for Kirill's results.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 17:39                                                         ` H.J. Lu
@ 2013-12-10 19:05                                                           ` Tejas Belagod
  2013-12-10 19:12                                                             ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-10 19:05 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill Yukhin, Richard Henderson, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

H.J. Lu wrote:
> On Tue, Dec 10, 2013 at 9:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>> H.J. Lu wrote:
>>> On Tue, Dec 10, 2013 at 9:04 AM, Kirill Yukhin <kirill.yukhin@gmail.com>
>>> wrote:
>>>> On 10 Dec 08:23, H.J. Lu wrote:
>>>>> What is wrong to pass the correct offset to
>>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>>> ignore it.
>>>> Yes, but as fas as understand this hook as a predicate
>>>> saying if it not-safe to change mode1 to mode2 for given
>>>
>>> In many places, the macro is used with the known offset.
>>> I have a follow up patch which improves x86 code generation,
>>> in cases like:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45198
>>>
>> What is it that subreg_get_info () can't resolve that
>> CANNOT_CHANGE_MODE_CLASS with an offset can?
>>
> 
> We have
> 
> int
> simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
>                        unsigned int offset, enum machine_mode ymode)
> {
>   struct subreg_info info;
>   unsigned int yregno;
> 
> #ifdef CANNOT_CHANGE_MODE_CLASS
>   /* Give the backend a chance to disallow the mode change.  */
>   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
>       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
>       && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
>       /* We can use mode change in LRA for some transformations.  */
>       && ! lra_in_progress)
>     return -1;
> #endif
> 
> CANNOT_CHANGE_MODE_CLASS is checked before subreg_get_info is
> called.  When REG_CANNOT_CHANGE_MODE_P returns false,
> there is nothing subreg_get_info can do.
> 

So, if (subreg:DI (match_operand:V4SF 1 "register_operand" "x,x") 0) is a valid 
subreg, why not allow it in CANNOT_CHANGE_MODE_CLASS (like in Kirill's patch 
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00987.html) and resolve the actual 
register later in subreg_get_info ()?

In general, as I understand it, CANNOT_CHANGE_MODE_CLASS doesn't need an offset 
because it only checks for validity of a mode-change in a regclass from the 
point of view of bit-correct-representation (eg. a register when subreg'ed still 
has the same order of bits) - the actual register reference is done elsewhere 
using the offset.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 19:05                                                           ` Tejas Belagod
@ 2013-12-10 19:12                                                             ` H.J. Lu
  2013-12-10 19:52                                                               ` Paul_Koning
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 19:12 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Kirill Yukhin, Richard Henderson, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Richard Sandiford, Uros Bizjak,
	Jakub Jelinek

On Tue, Dec 10, 2013 at 11:05 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> H.J. Lu wrote:
>>
>> On Tue, Dec 10, 2013 at 9:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>>>
>>> H.J. Lu wrote:
>>>>
>>>> On Tue, Dec 10, 2013 at 9:04 AM, Kirill Yukhin <kirill.yukhin@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 10 Dec 08:23, H.J. Lu wrote:
>>>>>>
>>>>>> What is wrong to pass the correct offset to
>>>>>> CANNOT_CHANGE_MODE_CLASS?  Backends are free to
>>>>>> ignore it.
>>>>>
>>>>> Yes, but as fas as understand this hook as a predicate
>>>>> saying if it not-safe to change mode1 to mode2 for given
>>>>
>>>>
>>>> In many places, the macro is used with the known offset.
>>>> I have a follow up patch which improves x86 code generation,
>>>> in cases like:
>>>>
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45198
>>>>
>>> What is it that subreg_get_info () can't resolve that
>>> CANNOT_CHANGE_MODE_CLASS with an offset can?
>>>
>>
>> We have
>>
>> int
>> simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
>>                        unsigned int offset, enum machine_mode ymode)
>> {
>>   struct subreg_info info;
>>   unsigned int yregno;
>>
>> #ifdef CANNOT_CHANGE_MODE_CLASS
>>   /* Give the backend a chance to disallow the mode change.  */
>>   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
>>       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
>>       && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
>>       /* We can use mode change in LRA for some transformations.  */
>>       && ! lra_in_progress)
>>     return -1;
>> #endif
>>
>> CANNOT_CHANGE_MODE_CLASS is checked before subreg_get_info is
>> called.  When REG_CANNOT_CHANGE_MODE_P returns false,
>> there is nothing subreg_get_info can do.
>>
>
> So, if (subreg:DI (match_operand:V4SF 1 "register_operand" "x,x") 0) is a
> valid subreg, why not allow it in CANNOT_CHANGE_MODE_CLASS (like in Kirill's
> patch http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00987.html) and resolve
> the actual register later in subreg_get_info ()?

Let's wait for Kirill's results on GCC testsuite.

> In general, as I understand it, CANNOT_CHANGE_MODE_CLASS doesn't need an
> offset because it only checks for validity of a mode-change in a regclass
> from the point of view of bit-correct-representation (eg. a register when
> subreg'ed still has the same order of bits) - the actual register reference
> is done elsewhere using the offset.
>
> Thanks,
> Tejas.
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 19:12                                                             ` H.J. Lu
@ 2013-12-10 19:52                                                               ` Paul_Koning
  0 siblings, 0 replies; 76+ messages in thread
From: Paul_Koning @ 2013-12-10 19:52 UTC (permalink / raw)
  To: hjl.tools; +Cc: kirill.yukhin, gcc-patches


On Dec 10, 2013, at 2:12 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> On Tue, Dec 10, 2013 at 11:05 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>> ...
>> So, if (subreg:DI (match_operand:V4SF 1 "register_operand" "x,x") 0) is a
>> valid subreg, why not allow it in CANNOT_CHANGE_MODE_CLASS (like in Kirill's
>> patch http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00987.html) and resolve
>> the actual register later in subreg_get_info ()?
> 
> Let's wait for Kirill's results on GCC testsuite.

I'm puzzled.  What is the connection between testsuite results and a design decision about a code change?  The question remains valid even if the testsuite didn't exist at all.

	paul

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 18:45                                                           ` Richard Sandiford
  2013-12-10 18:46                                                             ` H.J. Lu
@ 2013-12-10 20:40                                                             ` Richard Henderson
  2013-12-10 21:09                                                               ` H.J. Lu
  2013-12-11  9:14                                                               ` Richard Sandiford
  1 sibling, 2 replies; 76+ messages in thread
From: Richard Henderson @ 2013-12-10 20:40 UTC (permalink / raw)
  To: H.J. Lu, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	rdsandiford

On 12/10/2013 10:44 AM, Richard Sandiford wrote:
> Sorry, I don't understand.  I never said it was invalid.  I said
> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
> a single register.  On a little-endian target, the offset cannot be
> anything other than 0 in that case.
> 
> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
> something that is always invalid, regardless of the target.  That kind
> of situation should be rejected by target-independent code instead.

But, we want to disable the subreg before we know whether or not (reg:V4SF X)
will be allocated to a single hard register.  That is something that we can't
know in target-independent code before register allocation.

> In other words I'm arguing against the idea of passing the offset to
> CANNOT_CHANGE_MODE_CLASS (which you seemed to be supporting in the
> quote above).  I think Kirill's patch to remove the i386.c check was
> the right way to go.


Unless you can figure a way around the above, I think passing the offset to
C_C_M_C is probably the way to go.  I need to have a look over the patches
though...

> 
> There's no need for a separate insn though.  Once you allow the subregs
> (as per Kirill's patch), the normal move patterns will handle them.

Absolutely.


r~

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 20:40                                                             ` Richard Henderson
@ 2013-12-10 21:09                                                               ` H.J. Lu
  2013-12-10 21:51                                                                 ` H.J. Lu
  2013-12-11  9:14                                                               ` Richard Sandiford
  1 sibling, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 21:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Kirill Yukhin, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Tue, Dec 10, 2013 at 12:39 PM, Richard Henderson <rth@redhat.com> wrote:
> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>> Sorry, I don't understand.  I never said it was invalid.  I said
>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>> a single register.  On a little-endian target, the offset cannot be
>> anything other than 0 in that case.
>>
>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>> something that is always invalid, regardless of the target.  That kind
>> of situation should be rejected by target-independent code instead.
>
> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
> will be allocated to a single hard register.  That is something that we can't
> know in target-independent code before register allocation.

I tried Kirill's patch.  But LRA isn't prepared to handle it:

spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c
-B/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/
-L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/.libs
-latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O1
-std=c11 -pedantic-errors -lm -m32 -o ./c11-atomic-exec-1.exe^M
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:
In function 'test_simple_assign':^M
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:81:1:
internal compiler error: Maximum number of LRA constraint passes is
achieved (30)^M
^M
0x88ed77 lra_constraints(bool)^M
        /export/gnu/import/git/gcc/gcc/lra-constraints.c:3871^M
0x87fe8c lra(_IO_FILE*)^M
        /export/gnu/import/git/gcc/gcc/lra.c:2331^M
0x840f76 do_reload^M
        /export/gnu/import/git/gcc/gcc/ira.c:5455^M
0x840f76 rest_of_handle_reload^M
        /export/gnu/import/git/gcc/gcc/ira.c:5584^M
0x840f76 execute^M
        /export/gnu/import/git/gcc/gcc/ira.c:5613^M

>> In other words I'm arguing against the idea of passing the offset to
>> CANNOT_CHANGE_MODE_CLASS (which you seemed to be supporting in the
>> quote above).  I think Kirill's patch to remove the i386.c check was
>> the right way to go.
>
>
> Unless you can figure a way around the above, I think passing the offset to
> C_C_M_C is probably the way to go.  I need to have a look over the patches
> though...
>
>>
>> There's no need for a separate insn though.  Once you allow the subregs
>> (as per Kirill's patch), the normal move patterns will handle them.
>
> Absolutely.
>

We may need to adjust the existing patterns if subreg is allowed.
I have a few small testcases I can try.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 21:09                                                               ` H.J. Lu
@ 2013-12-10 21:51                                                                 ` H.J. Lu
  2013-12-10 22:25                                                                   ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 21:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Kirill Yukhin, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Tue, Dec 10, 2013 at 1:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Dec 10, 2013 at 12:39 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>> a single register.  On a little-endian target, the offset cannot be
>>> anything other than 0 in that case.
>>>
>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>> something that is always invalid, regardless of the target.  That kind
>>> of situation should be rejected by target-independent code instead.
>>
>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>> will be allocated to a single hard register.  That is something that we can't
>> know in target-independent code before register allocation.
>
> I tried Kirill's patch.  But LRA isn't prepared to handle it:
>
> spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/
> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c
> -B/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/
> -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/.libs
> -latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O1
> -std=c11 -pedantic-errors -lm -m32 -o ./c11-atomic-exec-1.exe^M
> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:
> In function 'test_simple_assign':^M
> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:81:1:
> internal compiler error: Maximum number of LRA constraint passes is
> achieved (30)^M
> ^M
> 0x88ed77 lra_constraints(bool)^M
>         /export/gnu/import/git/gcc/gcc/lra-constraints.c:3871^M
> 0x87fe8c lra(_IO_FILE*)^M
>         /export/gnu/import/git/gcc/gcc/lra.c:2331^M
> 0x840f76 do_reload^M
>         /export/gnu/import/git/gcc/gcc/ira.c:5455^M
> 0x840f76 rest_of_handle_reload^M
>         /export/gnu/import/git/gcc/gcc/ira.c:5584^M
> 0x840f76 execute^M
>         /export/gnu/import/git/gcc/gcc/ira.c:5613^M
>

I got several hundred failures like this in GCC
testsuite with -m32 and -m64 on Linux/x86-64.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 21:51                                                                 ` H.J. Lu
@ 2013-12-10 22:25                                                                   ` Tejas Belagod
  2013-12-10 22:33                                                                     ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-10 22:25 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On 10 December 2013 21:51, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Dec 10, 2013 at 1:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Dec 10, 2013 at 12:39 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>> a single register.  On a little-endian target, the offset cannot be
>>>> anything other than 0 in that case.
>>>>
>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>> something that is always invalid, regardless of the target.  That kind
>>>> of situation should be rejected by target-independent code instead.
>>>
>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>> will be allocated to a single hard register.  That is something that we can't
>>> know in target-independent code before register allocation.
>>
>> I tried Kirill's patch.  But LRA isn't prepared to handle it:
>>
>> spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/
>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c
>> -B/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/
>> -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/.libs
>> -latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O1
>> -std=c11 -pedantic-errors -lm -m32 -o ./c11-atomic-exec-1.exe^M
>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:
>> In function 'test_simple_assign':^M
>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:81:1:
>> internal compiler error: Maximum number of LRA constraint passes is
>> achieved (30)^M
>> ^M
>> 0x88ed77 lra_constraints(bool)^M
>>         /export/gnu/import/git/gcc/gcc/lra-constraints.c:3871^M
>> 0x87fe8c lra(_IO_FILE*)^M
>>         /export/gnu/import/git/gcc/gcc/lra.c:2331^M
>> 0x840f76 do_reload^M
>>         /export/gnu/import/git/gcc/gcc/ira.c:5455^M
>> 0x840f76 rest_of_handle_reload^M
>>         /export/gnu/import/git/gcc/gcc/ira.c:5584^M
>> 0x840f76 execute^M
>>         /export/gnu/import/git/gcc/gcc/ira.c:5613^M
>>
>
> I got several hundred failures like this in GCC
> testsuite with -m32 and -m64 on Linux/x86-64.
>

I think this is the same as:

http://gcc.gnu.org/ml/gcc/2013-12/msg00086.html

LRA does not seem to know how to resolve subregs with non-zero offsets
that don't map to a full-hardreg.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 22:25                                                                   ` Tejas Belagod
@ 2013-12-10 22:33                                                                     ` H.J. Lu
  2013-12-11  1:33                                                                       ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-10 22:33 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Tue, Dec 10, 2013 at 2:25 PM, Tejas Belagod <belagod.tejas@gmail.com> wrote:
> On 10 December 2013 21:51, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Dec 10, 2013 at 1:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Dec 10, 2013 at 12:39 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>> anything other than 0 in that case.
>>>>>
>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>> something that is always invalid, regardless of the target.  That kind
>>>>> of situation should be rejected by target-independent code instead.
>>>>
>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>> will be allocated to a single hard register.  That is something that we can't
>>>> know in target-independent code before register allocation.
>>>
>>> I tried Kirill's patch.  But LRA isn't prepared to handle it:
>>>
>>> spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
>>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/
>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c
>>> -B/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/
>>> -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/.libs
>>> -latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O1
>>> -std=c11 -pedantic-errors -lm -m32 -o ./c11-atomic-exec-1.exe^M
>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:
>>> In function 'test_simple_assign':^M
>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:81:1:
>>> internal compiler error: Maximum number of LRA constraint passes is
>>> achieved (30)^M
>>> ^M
>>> 0x88ed77 lra_constraints(bool)^M
>>>         /export/gnu/import/git/gcc/gcc/lra-constraints.c:3871^M
>>> 0x87fe8c lra(_IO_FILE*)^M
>>>         /export/gnu/import/git/gcc/gcc/lra.c:2331^M
>>> 0x840f76 do_reload^M
>>>         /export/gnu/import/git/gcc/gcc/ira.c:5455^M
>>> 0x840f76 rest_of_handle_reload^M
>>>         /export/gnu/import/git/gcc/gcc/ira.c:5584^M
>>> 0x840f76 execute^M
>>>         /export/gnu/import/git/gcc/gcc/ira.c:5613^M
>>>
>>
>> I got several hundred failures like this in GCC
>> testsuite with -m32 and -m64 on Linux/x86-64.
>>
>
> I think this is the same as:
>
> http://gcc.gnu.org/ml/gcc/2013-12/msg00086.html
>
> LRA does not seem to know how to resolve subregs with non-zero offsets
> that don't map to a full-hardreg.
>

Looks like it.  I am rebuilding and retesting Kirill's patch +
Vladimir's patch.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 22:33                                                                     ` H.J. Lu
@ 2013-12-11  1:33                                                                       ` H.J. Lu
  0 siblings, 0 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-11  1:33 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Tue, Dec 10, 2013 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Dec 10, 2013 at 2:25 PM, Tejas Belagod <belagod.tejas@gmail.com> wrote:
>> On 10 December 2013 21:51, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Dec 10, 2013 at 1:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Dec 10, 2013 at 12:39 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>> anything other than 0 in that case.
>>>>>>
>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>> something that is always invalid, regardless of the target.  That kind
>>>>>> of situation should be rejected by target-independent code instead.
>>>>>
>>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>>> will be allocated to a single hard register.  That is something that we can't
>>>>> know in target-independent code before register allocation.
>>>>
>>>> I tried Kirill's patch.  But LRA isn't prepared to handle it:
>>>>
>>>> spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
>>>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/
>>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c
>>>> -B/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/
>>>> -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libatomic/.libs
>>>> -latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O1
>>>> -std=c11 -pedantic-errors -lm -m32 -o ./c11-atomic-exec-1.exe^M
>>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:
>>>> In function 'test_simple_assign':^M
>>>> /export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-1.c:81:1:
>>>> internal compiler error: Maximum number of LRA constraint passes is
>>>> achieved (30)^M
>>>> ^M
>>>> 0x88ed77 lra_constraints(bool)^M
>>>>         /export/gnu/import/git/gcc/gcc/lra-constraints.c:3871^M
>>>> 0x87fe8c lra(_IO_FILE*)^M
>>>>         /export/gnu/import/git/gcc/gcc/lra.c:2331^M
>>>> 0x840f76 do_reload^M
>>>>         /export/gnu/import/git/gcc/gcc/ira.c:5455^M
>>>> 0x840f76 rest_of_handle_reload^M
>>>>         /export/gnu/import/git/gcc/gcc/ira.c:5584^M
>>>> 0x840f76 execute^M
>>>>         /export/gnu/import/git/gcc/gcc/ira.c:5613^M
>>>>
>>>
>>> I got several hundred failures like this in GCC
>>> testsuite with -m32 and -m64 on Linux/x86-64.
>>>
>>
>> I think this is the same as:
>>
>> http://gcc.gnu.org/ml/gcc/2013-12/msg00086.html
>>
>> LRA does not seem to know how to resolve subregs with non-zero offsets
>> that don't map to a full-hardreg.
>>
>
> Looks like it.  I am rebuilding and retesting Kirill's patch +
> Vladimir's patch.
>

Vladimir's patch cuts down the ICEs to 53. I still got

spawn -ignore SIGHUP /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/vect/slp-2.c
-fno-diagnostics-show-caret -fdiagnostics-color=never -msse2
-ftree-vectorize -fno-vect-cost-model -fno-common -O2
-fdump-tree-vect-details -lm -o ./slp-2.exe^M
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/vect/slp-2.c: In
function 'main1':^M
/export/gnu/import/git/gcc/gcc/testsuite/gcc.dg/vect/slp-2.c:131:1:
internal compiler error: Max. number of generated reload insns per
insn is achieved (90)^M
^M
0x88ecb3 lra_constraints(bool)^M
        /export/gnu/import/git/gcc/gcc/lra-constraints.c:3986^M
0x87fe8c lra(_IO_FILE*)^M
        /export/gnu/import/git/gcc/gcc/lra.c:2331^M
0x840f76 do_reload^M
        /export/gnu/import/git/gcc/gcc/ira.c:5455^M
0x840f76 rest_of_handle_reload^M
        /export/gnu/import/git/gcc/gcc/ira.c:5584^M
0x840f76 execute^M
        /export/gnu/import/git/gcc/gcc/ira.c:5613^M
Please submit a full bug report,^M
with preprocessed source if appropriate.^M
Please include the complete backtrace with any bug report.^M
See <http://gcc.gnu.org/bugs.html> for instructions.^M
compiler exited with status 1

I also got the wrong code regression:

FAIL: gfortran.dg/round_4.f90  -O0  execution test
FAIL: gfortran.dg/round_4.f90  -O1  execution test
FAIL: gfortran.dg/round_4.f90  -O2  execution test
FAIL: gfortran.dg/round_4.f90  -O3 -fomit-frame-pointer  execution test
FAIL: gfortran.dg/round_4.f90  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  execution test
FAIL: gfortran.dg/round_4.f90  -O3 -fomit-frame-pointer -funroll-loops
 execution test
FAIL: gfortran.dg/round_4.f90  -O3 -g  execution test
FAIL: gfortran.dg/round_4.f90  -Os  execution test

My hjl/subreg branch doesn't have those regressions.

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-10 20:40                                                             ` Richard Henderson
  2013-12-10 21:09                                                               ` H.J. Lu
@ 2013-12-11  9:14                                                               ` Richard Sandiford
  2013-12-11 13:10                                                                 ` H.J. Lu
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2013-12-11  9:14 UTC (permalink / raw)
  To: Richard Henderson
  Cc: H.J. Lu, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek

Richard Henderson <rth@redhat.com> writes:
> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>> Sorry, I don't understand.  I never said it was invalid.  I said
>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>> a single register.  On a little-endian target, the offset cannot be
>> anything other than 0 in that case.
>> 
>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>> something that is always invalid, regardless of the target.  That kind
>> of situation should be rejected by target-independent code instead.
>
> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
> will be allocated to a single hard register.  That is something that we can't
> know in target-independent code before register allocation.

I was thinking that if we've got a class, we've also got things like
CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
But even in the padding cases an offset-based check in C_C_M_C could
be derived from other information.

subreg_get_info handles padding with:

      nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
      if (GET_MODE_INNER (xmode) == VOIDmode)
	xmode_unit = xmode;
      else
	xmode_unit = GET_MODE_INNER (xmode);
      gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
      gcc_assert (nregs_xmode
		  == (GET_MODE_NUNITS (xmode)
		      * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
      gcc_assert (hard_regno_nregs[xregno][xmode]
		  == (hard_regno_nregs[xregno][xmode_unit]
		      * GET_MODE_NUNITS (xmode)));

      /* You can only ask for a SUBREG of a value with holes in the middle
	 if you don't cross the holes.  (Such a SUBREG should be done by
	 picking a different register class, or doing it in memory if
	 necessary.)  An example of a value with holes is XCmode on 32-bit
	 x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
	 3 for each part, but in memory it's two 128-bit parts.
	 Padding is assumed to be at the end (not necessarily the 'high part')
	 of each unit.  */
      if ((offset / GET_MODE_SIZE (xmode_unit) + 1
	   < GET_MODE_NUNITS (xmode))
	  && (offset / GET_MODE_SIZE (xmode_unit)
	      != ((offset + GET_MODE_SIZE (ymode) - 1)
		  / GET_MODE_SIZE (xmode_unit))))
	{
	  info->representable_p = false;
	  rknown = true;
	}

and I wouldn't really want to force targets to individually reproduce
that kind of logic at the class level.  If the worst comes to the worst
we could cache the difficult cases.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11  9:14                                                               ` Richard Sandiford
@ 2013-12-11 13:10                                                                 ` H.J. Lu
  2013-12-11 15:49                                                                   ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-11 13:10 UTC (permalink / raw)
  To: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 5947 bytes --]

On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Henderson <rth@redhat.com> writes:
>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>> a single register.  On a little-endian target, the offset cannot be
>>> anything other than 0 in that case.
>>>
>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>> something that is always invalid, regardless of the target.  That kind
>>> of situation should be rejected by target-independent code instead.
>>
>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>> will be allocated to a single hard register.  That is something that we can't
>> know in target-independent code before register allocation.
>
> I was thinking that if we've got a class, we've also got things like
> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
> But even in the padding cases an offset-based check in C_C_M_C could
> be derived from other information.
>
> subreg_get_info handles padding with:
>
>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>       if (GET_MODE_INNER (xmode) == VOIDmode)
>         xmode_unit = xmode;
>       else
>         xmode_unit = GET_MODE_INNER (xmode);
>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>       gcc_assert (nregs_xmode
>                   == (GET_MODE_NUNITS (xmode)
>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>       gcc_assert (hard_regno_nregs[xregno][xmode]
>                   == (hard_regno_nregs[xregno][xmode_unit]
>                       * GET_MODE_NUNITS (xmode)));
>
>       /* You can only ask for a SUBREG of a value with holes in the middle
>          if you don't cross the holes.  (Such a SUBREG should be done by
>          picking a different register class, or doing it in memory if
>          necessary.)  An example of a value with holes is XCmode on 32-bit
>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>          3 for each part, but in memory it's two 128-bit parts.
>          Padding is assumed to be at the end (not necessarily the 'high part')
>          of each unit.  */
>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>            < GET_MODE_NUNITS (xmode))
>           && (offset / GET_MODE_SIZE (xmode_unit)
>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>                   / GET_MODE_SIZE (xmode_unit))))
>         {
>           info->representable_p = false;
>           rknown = true;
>         }
>
> and I wouldn't really want to force targets to individually reproduce
> that kind of logic at the class level.  If the worst comes to the worst
> we could cache the difficult cases.
>

My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
to know if the subreg byte is zero or not.  It doesn't care about mode
padding.  You are concerned about information passed to
CANNOT_CHANGE_MODE_CLASS is too expensive for target
to process.  It isn't the case for x86.  Am I correct that mode can't change
if subreg byte is non-zero?  A target can just check subreg byte != 0,
like my patch does.

Here is a patch to add SUBREG_BYTE to CANNOT_CHANGE_MODE_CLASS.
Tested on Linux/x86-64.  Does it look OK?

Thanks.

-- 
H.J.
---
2013-12-11   H.J. Lu  <hongjiu.lu@intel.com>

    * combine.c (subst): Pass subreg byte to REG_CANNOT_CHANGE_MODE_P.
    (simplify_set): Likewise.
    * emit-rtl.c (validate_subreg): Likewise.
    * recog.c (register_operand): Likewise.
    * rtlanal.c (simplify_subreg_regno): Likewise.
    * hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Add SUBREG_BYTE
    and pass it to CANNOT_CHANGE_MODE_CLASS.
    * regcprop.c (mode_change_ok): Pass unknown subreg byte to
    REG_CANNOT_CHANGE_MODE_P.
    * reginfo.c (record_subregs_of_mode): Pass unknown subreg byte
    to CANNOT_CHANGE_MODE_CLASS.
    * postreload.c (reload_cse_simplify_set): Pass subreg byte to
    CANNOT_CHANGE_MODE_CLASS.
    (reload_cse_simplify_operands): Likewise.
    * reload.c (push_reload): Likewise.
    * reload1.c (choose_reload_regs): Pass subreg byte to
    REG_CANNOT_CHANGE_MODE_P.
    (inherit_piecemeal_p): Pass unknown subreg byte to
    REG_CANNOT_CHANGE_MODE_P.
    * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Add
    and ignore subreg byte.
    * config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/mep/mep.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/pdp11/pdp11.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/rs6000/rs6000.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/s390/s390.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/score/score.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/sh/sh.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/sparc/sparc.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/spu/spu.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
    * config/i386/i386-protos.h (ix86_cannot_change_mode_class): Add
    an unsigned int argument.
    * config/i386/i386.c (ix86_cannot_change_mode_class): Take subreg
    byte.  Return true only if subreg byte is non-zero.
    * config/i386/i386.h (CANNOT_CHANGE_MODE_CLASS): Add SUBREG_BYTE
    and pass it to ix86_cannot_change_mode_class.
    * doc/rtl.texi: Add subreg_byte to CANNOT_CHANGE_MODE_CLASS.
    * doc/tm.texi.in: Likewise.
    * doc/tm.texi: Regenerated.

[-- Attachment #2: 0001-Add-subreg_byte-to-CANNOT_CHANGE_MODE_CLASS.patch --]
[-- Type: text/plain, Size: 25778 bytes --]

From 5dbe6ec205636d13a0e614371b9a3e016ea10cb6 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 9 Dec 2013 10:46:24 -0800
Subject: [PATCH] Add subreg_byte to CANNOT_CHANGE_MODE_CLASS

---
 gcc/combine.c                 |  2 ++
 gcc/config/aarch64/aarch64.h  |  6 +++---
 gcc/config/alpha/alpha.h      |  2 +-
 gcc/config/arm/arm.h          |  8 ++++----
 gcc/config/i386/i386-protos.h |  4 +++-
 gcc/config/i386/i386.c        | 12 ++++++------
 gcc/config/i386/i386.h        |  7 ++++---
 gcc/config/ia64/ia64.h        |  2 +-
 gcc/config/m32c/m32c.h        |  2 +-
 gcc/config/mep/mep.h          |  2 +-
 gcc/config/mips/mips.h        |  2 +-
 gcc/config/msp430/msp430.h    | 10 +++++-----
 gcc/config/pa/pa32-regs.h     |  2 +-
 gcc/config/pa/pa64-regs.h     |  2 +-
 gcc/config/pdp11/pdp11.h      |  2 +-
 gcc/config/rs6000/rs6000.h    |  2 +-
 gcc/config/s390/s390.h        |  2 +-
 gcc/config/score/score.h      |  4 ++--
 gcc/config/sh/sh.h            |  2 +-
 gcc/config/sparc/sparc.h      |  2 +-
 gcc/config/spu/spu.h          |  2 +-
 gcc/doc/rtl.texi              |  7 ++++---
 gcc/doc/tm.texi               |  8 +++++---
 gcc/doc/tm.texi.in            |  8 +++++---
 gcc/emit-rtl.c                |  2 +-
 gcc/hard-reg-set.h            |  6 +++---
 gcc/postreload.c              |  4 ++++
 gcc/recog.c                   |  3 ++-
 gcc/regcprop.c                |  4 +++-
 gcc/reginfo.c                 |  1 +
 gcc/reload.c                  | 10 +++++++---
 gcc/reload1.c                 | 10 +++++++---
 gcc/rtlanal.c                 |  2 +-
 33 files changed, 85 insertions(+), 59 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index dea6c28..8e3b962 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5084,6 +5084,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		      && REGNO (to) < FIRST_PSEUDO_REGISTER
 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
 						   GET_MODE (to),
+						   SUBREG_BYTE (x),
 						   GET_MODE (x)))
 		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 #endif
@@ -6450,6 +6451,7 @@ simplify_set (rtx x)
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
+					 SUBREG_BYTE (src),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..7eac69a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -820,9 +820,9 @@ do {									     \
 
 /*  VFP registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
-   ? reg_classes_intersect_p (FP_REGS, (CLASS))		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
+   ? reg_classes_intersect_p (FP_REGS, (CLASS))			\
    : 0)
 
 
diff --git a/gcc/config/alpha/alpha.h b/gcc/config/alpha/alpha.h
index 2e7c078..a183a44 100644
--- a/gcc/config/alpha/alpha.h
+++ b/gcc/config/alpha/alpha.h
@@ -541,7 +541,7 @@ enum reg_class {
 
 /* Return the class of registers that cannot change mode from FROM to TO.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
    ? reg_classes_intersect_p (FLOAT_REGS, CLASS) : 0)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b8b80e..f761a3b 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1247,10 +1247,10 @@ enum reg_class
    In big-endian mode, modes greater than word size (i.e. DFmode) are stored in
    VFP registers in little-endian order.  We can't describe that accurately to
    GCC, so avoid taking subregs of such values.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (TARGET_VFP && TARGET_BIG_END				\
-   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD		\
-       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
+  (TARGET_VFP && TARGET_BIG_END					\
+   && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD			\
+       || GET_MODE_SIZE (TO) > UNITS_PER_WORD)			\
    && reg_classes_intersect_p (VFP_REGS, (CLASS)))
 
 /* The class value for index registers, and the one for base regs.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 73feef2..0cbb9ae 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -167,7 +167,9 @@ extern bool ix86_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
-					   enum machine_mode, enum reg_class);
+					   unsigned int,
+					   enum machine_mode,
+					   enum reg_class);
 
 extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cdd63e5..68628ab 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35003,10 +35003,12 @@ ix86_class_max_nregs (reg_class_t rclass, enum machine_mode mode)
 }
 
 /* Return true if the registers in CLASS cannot represent the change from
-   modes FROM to TO.  */
+   modes FROM at offset SUBREG_BYTE to TO.  */
 
 bool
-ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
+ix86_cannot_change_mode_class (enum machine_mode from,
+			       unsigned int subreg_byte,
+			       enum machine_mode to,
 			       enum reg_class regclass)
 {
   if (from == to)
@@ -35027,10 +35029,8 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
 	return true;
 
       /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
+	 are otherwise valid for integer registers.  */
+      if (subreg_byte != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
 	return true;
     }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7efd1e0..d43dcbd 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1522,10 +1522,11 @@ enum reg_class
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
-/* Return a class of registers that cannot change FROM mode to TO mode.  */
+/* Return a class of registers that cannot change FROM mode to TO mode
+   with SUBREG_BYTE.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
-  ix86_cannot_change_mode_class (FROM, TO, CLASS)
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
+  ix86_cannot_change_mode_class (FROM, SUBREG_BYTE, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
 
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ae9027c..d3aca62 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -856,7 +856,7 @@ enum reg_class
    In FP regs, we can't change FP values to integer values and vice versa,
    but we can change e.g. DImode to SImode, and V2SFmode into DImode.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) 		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) 	\
   (reg_classes_intersect_p (CLASS, BR_REGS)			\
    ? (FROM) != (TO)						\
    : (SCALAR_FLOAT_MODE_P (FROM) != SCALAR_FLOAT_MODE_P (TO)	\
diff --git a/gcc/config/m32c/m32c.h b/gcc/config/m32c/m32c.h
index 3ceb093..497a743 100644
--- a/gcc/config/m32c/m32c.h
+++ b/gcc/config/m32c/m32c.h
@@ -415,7 +415,7 @@ enum reg_class
 
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
 
-#define CANNOT_CHANGE_MODE_CLASS(F,T,C) m32c_cannot_change_mode_class(F,T,C)
+#define CANNOT_CHANGE_MODE_CLASS(F,O,T,C) m32c_cannot_change_mode_class(F,T,C)
 
 /* STACK AND CALLING */
 
diff --git a/gcc/config/mep/mep.h b/gcc/config/mep/mep.h
index 023d73c..01bd3cd 100644
--- a/gcc/config/mep/mep.h
+++ b/gcc/config/mep/mep.h
@@ -321,7 +321,7 @@ extern char mep_leaf_registers[];
 
 #define MODES_TIEABLE_P(MODE1, MODE2) 1
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   mep_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 enum reg_class
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 021419c..ec5e2af 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2104,7 +2104,7 @@ enum reg_class
 
 #define CLASS_MAX_NREGS(CLASS, MODE) mips_class_max_nregs (CLASS, MODE)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   mips_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 953c638..c4cb0fd 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -394,11 +394,11 @@ typedef struct
   ((TARGET_LARGE && ((NREGS) <= 2)) ? PSImode : choose_hard_reg_mode ((REGNO), (NREGS), false))
 
 /* Also stop GCC from thinking that it can eliminate (SUBREG:PSI (SI)).  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM,TO,CLASS) \
-  (   ((TO) == PSImode && (FROM) == SImode)	\
-   || ((TO) == SImode  && (FROM) == PSImode)    \
-   || ((TO) == DImode  && (FROM) == PSImode)    \
-   || ((TO) == PSImode && (FROM) == DImode)     \
+#define CANNOT_CHANGE_MODE_CLASS(FROM,SUBREG_BYTE,TO,CLASS) \
+  (   ((TO) == PSImode && (FROM) == SImode)		    \
+   || ((TO) == SImode  && (FROM) == PSImode)		    \
+   || ((TO) == DImode  && (FROM) == PSImode)		    \
+   || ((TO) == PSImode && (FROM) == DImode)		    \
       )
 
 #define ACCUMULATE_OUTGOING_ARGS 1
diff --git a/gcc/config/pa/pa32-regs.h b/gcc/config/pa/pa32-regs.h
index 098e9ba..83681aa 100644
--- a/gcc/config/pa/pa32-regs.h
+++ b/gcc/config/pa/pa32-regs.h
@@ -296,7 +296,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pa/pa64-regs.h b/gcc/config/pa/pa64-regs.h
index 002520a..583ffa3 100644
--- a/gcc/config/pa/pa64-regs.h
+++ b/gcc/config/pa/pa64-regs.h
@@ -232,7 +232,7 @@ enum reg_class { NO_REGS, R1_REGS, GENERAL_REGS, FPUPPER_REGS, FP_REGS,
 
 /* Defines invalid mode changes.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pa_cannot_change_mode_class (FROM, TO, CLASS)
 
 /* Return the class number of the smallest class containing
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index d4bc19a..33d0f9f 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -282,7 +282,7 @@ enum reg_class { NO_REGS, MUL_REGS, GENERAL_REGS, LOAD_FPU_REGS, NO_LOAD_FPU_REG
   1									\
 )
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   pdp11_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb59235..b88209a 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1505,7 +1505,7 @@ extern enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 
 /* Return nonzero if for CLASS a mode change from FROM to TO is invalid.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)			\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)		\
   rs6000_cannot_change_mode_class_ptr (FROM, TO, CLASS)
 
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index bca18fe..a947836 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -419,7 +419,7 @@ enum processor_flags
    cannot use SUBREGs to switch between modes in FP registers.
    Likewise for access registers, since they have only half the
    word size on 64-bit.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	        \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			        \
    ? ((reg_classes_intersect_p (FP_REGS, CLASS)				\
        && (GET_MODE_SIZE (FROM) < 8 || GET_MODE_SIZE (TO) < 8))		\
diff --git a/gcc/config/score/score.h b/gcc/config/score/score.h
index ca73401..d5ca021 100644
--- a/gcc/config/score/score.h
+++ b/gcc/config/score/score.h
@@ -414,8 +414,8 @@ enum reg_class
 #define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X) \
   score_secondary_reload_class (CLASS, MODE, X)
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)    \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)        \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		       \
    ? reg_classes_intersect_p (HI_REG, (CLASS)) : 0)
 
 
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 9f07012..1a4c9e8 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1149,7 +1149,7 @@ extern enum reg_class regno_reg_class[FIRST_PSEUDO_REGISTER];
    operand of a SUBREG that changes the mode of the object illegally.
    ??? We need to renumber the internal numbers for the frnn registers
    when in little endian in order to allow mode size changes.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   sh_cannot_change_mode_class (FROM, TO, CLASS)
 \f
 /* Stack layout; function entry, exit and calling.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 7533e88..e3d9db8 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -912,7 +912,7 @@ extern enum reg_class sparc_regno_reg_class[FIRST_PSEUDO_REGISTER];
    Likewise for SFmode, since word-mode paradoxical subregs are
    problematic on big-endian architectures.  */
 
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS)	\
   (TARGET_ARCH64						\
    && GET_MODE_SIZE (FROM) == 4					\
    && GET_MODE_SIZE (TO) != 4					\
diff --git a/gcc/config/spu/spu.h b/gcc/config/spu/spu.h
index 64a2ba0..d0be0e3 100644
--- a/gcc/config/spu/spu.h
+++ b/gcc/config/spu/spu.h
@@ -226,7 +226,7 @@ enum reg_class {
 
 /* GCC assumes that modes are in the lowpart of a register, which is
    only true for SPU. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
         ((GET_MODE_SIZE (FROM) > 4 || GET_MODE_SIZE (TO) > 4) \
 	 && (GET_MODE_SIZE (FROM) < 16 || GET_MODE_SIZE (TO) < 16) \
 	 && GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 84c0444..7bc37a8 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1968,11 +1968,12 @@ value @samp{(reg:HI 4)}.
 @cindex @code{CANNOT_CHANGE_MODE_CLASS} and subreg semantics
 The rules above apply to both pseudo @var{reg}s and hard @var{reg}s.
 If the semantics are not correct for particular combinations of
-@var{m1}, @var{m2} and hard @var{reg}, the target-specific code
-must ensure that those combinations are never used.  For example:
+@var{m1}, @var{subreg_byte}, @var{m2} and hard @var{reg}, the
+target-specific code must ensure that those combinations are never used.
+For example:
 
 @smallexample
-CANNOT_CHANGE_MODE_CLASS (@var{m2}, @var{m1}, @var{class})
+CANNOT_CHANGE_MODE_CLASS (@var{m2}, @var{subreg_byte}, @var{m1}, @var{class})
 @end smallexample
 
 must be true for every class @var{class} that includes @var{reg}.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c4ecd99..f49aefb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2885,9 +2885,11 @@ This macro helps control the handling of multiple-word values
 in the reload pass.
 @end defmac
 
-@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{to}, @var{class})
+@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{subreg_byte}, @var{to}, @var{class})
 If defined, a C expression that returns nonzero for a @var{class} for which
-a change from mode @var{from} to mode @var{to} is invalid.
+a change from mode @var{from} at the @code{subreg} offset @var{subreg_byte}
+to mode @var{to} is invalid.  If the @code{subreg} offset is unknown, the
+size of the largest mode on the target should be used.
 
 For the example, loading 32-bit integer or floating-point objects into
 floating-point registers on the Alpha extends them to 64 bits.
@@ -2897,7 +2899,7 @@ register.  Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS}
 as below:
 
 @smallexample
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
    ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0)
 @end smallexample
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 7e459eb..ca7f374 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2526,9 +2526,11 @@ This macro helps control the handling of multiple-word values
 in the reload pass.
 @end defmac
 
-@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{to}, @var{class})
+@defmac CANNOT_CHANGE_MODE_CLASS (@var{from}, @var{subreg_byte}, @var{to}, @var{class})
 If defined, a C expression that returns nonzero for a @var{class} for which
-a change from mode @var{from} to mode @var{to} is invalid.
+a change from mode @var{from} at the @code{subreg} offset @var{subreg_byte}
+to mode @var{to} is invalid.  If the @code{subreg} offset is unknown, the
+size of the largest mode on the target should be used.
 
 For the example, loading 32-bit integer or floating-point objects into
 floating-point registers on the Alpha extends them to 64 bits.
@@ -2538,7 +2540,7 @@ register.  Therefore, @file{alpha.h} defines @code{CANNOT_CHANGE_MODE_CLASS}
 as below:
 
 @smallexample
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, SUBREG_BYTE, TO, CLASS) \
   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
    ? reg_classes_intersect_p (FLOAT_REGS, (CLASS)) : 0)
 @end smallexample
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index d7fa3a5..b8e3dfd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -748,7 +748,7 @@ validate_subreg (enum machine_mode omode, enum machine_mode imode,
       if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	  && GET_MODE_INNER (imode) == omode)
 	;
-      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, omode))
+      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, offset, omode))
 	return false;
 #endif
 
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index ad987f9..11a4b3e 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -716,9 +716,9 @@ extern struct target_hard_regs *this_target_hard_regs;
 
 extern const char * reg_class_names[];
 
-/* Given a hard REGN a FROM mode and a TO mode, return nonzero if
+/* Given a hard REGN a FROM mode at SUBREG_BYTE and a TO mode, return nonzero if
    REGN cannot change modes between the specified modes.  */
-#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, TO)                          \
-         CANNOT_CHANGE_MODE_CLASS (FROM, TO, REGNO_REG_CLASS (REGN))
+#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, SUBREG_BYTE, TO) \
+  CANNOT_CHANGE_MODE_CLASS (FROM, SUBREG_BYTE, TO, REGNO_REG_CLASS (REGN))
 
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/postreload.c b/gcc/postreload.c
index 37bd9ff..8fb2f20 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -349,6 +349,8 @@ reload_cse_simplify_set (rtx set, rtx insn)
 	      && extend_op != UNKNOWN
 #ifdef CANNOT_CHANGE_MODE_CLASS
 	      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+					    (GET_CODE (SET_DEST (set)) == SUBREG
+					     ? SUBREG_BYTE (SET_DEST (set)) : 0),
 					    word_mode,
 					    REGNO_REG_CLASS (REGNO (SET_DEST (set))))
 #endif
@@ -459,6 +461,8 @@ reload_cse_simplify_operands (rtx insn, rtx testreg)
 	     it cannot have been used in word_mode.  */
 	  else if (REG_P (SET_DEST (set))
 		   && CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
+						(GET_CODE (SET_DEST (set)) == SUBREG
+						 ? SUBREG_BYTE (SET_DEST (set)) : 0),
 						word_mode,
 						REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
 	    ; /* Continue ordinary processing.  */
diff --git a/gcc/recog.c b/gcc/recog.c
index dbd9a8a..e30d81c 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1069,7 +1069,8 @@ register_operand (rtx op, enum machine_mode mode)
 #ifdef CANNOT_CHANGE_MODE_CLASS
       if (REG_P (sub)
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
-	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
+	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub),
+				       SUBREG_BYTE (op), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
 	  /* LRA can generate some invalid SUBREGS just for matched
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 3c9ef3d..2be5774 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -389,7 +389,9 @@ mode_change_ok (enum machine_mode orig_mode, enum machine_mode new_mode,
     return false;
 
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode, new_mode);
+  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode,
+				    (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				    new_mode);
 #endif
 
   return true;
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 46288eb..6a150a4 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1222,6 +1222,7 @@ record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
 	if (!bitmap_bit_p (invalid_mode_changes,
 			   regno * N_REG_CLASSES + rclass)
 	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
+					 (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
 					 mode, (enum reg_class) rclass))
 	  bitmap_set_bit (invalid_mode_changes,
 			  regno * N_REG_CLASSES + rclass);
diff --git a/gcc/reload.c b/gcc/reload.c
index 96619f6..487d4d4 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1064,7 +1064,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (in != 0 && GET_CODE (in) == SUBREG
       && (subreg_lowpart_p (in) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)),
+				    SUBREG_BYTE (in), inmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
       && (CONSTANT_P (SUBREG_REG (in))
@@ -1113,7 +1114,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P
-	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)), inmode))
+	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)),
+	       SUBREG_BYTE (in), inmode))
 #endif
 	  ))
     {
@@ -1174,7 +1176,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (out != 0 && GET_CODE (out) == SUBREG
       && (subreg_lowpart_p (out) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)),
+				    SUBREG_BYTE (out), outmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
       && (CONSTANT_P (SUBREG_REG (out))
@@ -1209,6 +1212,7 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P (REGNO (SUBREG_REG (out)),
 					   GET_MODE (SUBREG_REG (out)),
+					   SUBREG_BYTE (out),
 					   outmode))
 #endif
 	  ))
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 47439ce..10d5a4e 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6609,7 +6609,7 @@ choose_reload_regs (struct insn_chain *chain)
 		     mode MODE.  */
 		  && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
 						GET_MODE (reg_last_reload_reg[regno]),
-						mode)
+						byte, mode)
 #endif
 		  )
 		{
@@ -8080,8 +8080,12 @@ inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
 		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
-	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode,
+				     (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				     reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode,
+					(MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					reg_raw_mode[src]));
 #else
   return true;
 #endif
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 38f9e36..9687110 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3533,7 +3533,7 @@ simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, offset, ymode)
       /* We can use mode change in LRA for some transformations.  */
       && ! lra_in_progress)
     return -1;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 13:10                                                                 ` H.J. Lu
@ 2013-12-11 15:49                                                                   ` Richard Sandiford
  2013-12-11 16:09                                                                     ` H.J. Lu
  2013-12-14 16:32                                                                     ` H.J. Lu
  0 siblings, 2 replies; 76+ messages in thread
From: Richard Sandiford @ 2013-12-11 15:49 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Henderson <rth@redhat.com> writes:
>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>> a single register.  On a little-endian target, the offset cannot be
>>>> anything other than 0 in that case.
>>>>
>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>> something that is always invalid, regardless of the target.  That kind
>>>> of situation should be rejected by target-independent code instead.
>>>
>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>> will be allocated to a single hard register.  That is something that we can't
>>> know in target-independent code before register allocation.
>>
>> I was thinking that if we've got a class, we've also got things like
>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>> But even in the padding cases an offset-based check in C_C_M_C could
>> be derived from other information.
>>
>> subreg_get_info handles padding with:
>>
>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>         xmode_unit = xmode;
>>       else
>>         xmode_unit = GET_MODE_INNER (xmode);
>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>       gcc_assert (nregs_xmode
>>                   == (GET_MODE_NUNITS (xmode)
>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>                       * GET_MODE_NUNITS (xmode)));
>>
>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>          picking a different register class, or doing it in memory if
>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>          3 for each part, but in memory it's two 128-bit parts.
>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>          of each unit.  */
>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>            < GET_MODE_NUNITS (xmode))
>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>                   / GET_MODE_SIZE (xmode_unit))))
>>         {
>>           info->representable_p = false;
>>           rknown = true;
>>         }
>>
>> and I wouldn't really want to force targets to individually reproduce
>> that kind of logic at the class level.  If the worst comes to the worst
>> we could cache the difficult cases.
>>
>
> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
> to know if the subreg byte is zero or not.  It doesn't care about mode
> padding.  You are concerned about information passed to
> CANNOT_CHANGE_MODE_CLASS is too expensive for target
> to process.  It isn't the case for x86.

No, I'm concerned that by going this route, we're forcing every target
(or at least every target with wider-than-word registers, which is most
of the common ones) to implement the same target-independent restriction.
This is not an x86-specific issue.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 15:49                                                                   ` Richard Sandiford
@ 2013-12-11 16:09                                                                     ` H.J. Lu
  2013-12-11 16:26                                                                       ` Tejas Belagod
  2013-12-14 16:32                                                                     ` H.J. Lu
  1 sibling, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-11 16:09 UTC (permalink / raw)
  To: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Henderson <rth@redhat.com> writes:
>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>> anything other than 0 in that case.
>>>>>
>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>> something that is always invalid, regardless of the target.  That kind
>>>>> of situation should be rejected by target-independent code instead.
>>>>
>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>> will be allocated to a single hard register.  That is something that we can't
>>>> know in target-independent code before register allocation.
>>>
>>> I was thinking that if we've got a class, we've also got things like
>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>> But even in the padding cases an offset-based check in C_C_M_C could
>>> be derived from other information.
>>>
>>> subreg_get_info handles padding with:
>>>
>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>         xmode_unit = xmode;
>>>       else
>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>       gcc_assert (nregs_xmode
>>>                   == (GET_MODE_NUNITS (xmode)
>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>                       * GET_MODE_NUNITS (xmode)));
>>>
>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>          picking a different register class, or doing it in memory if
>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>          3 for each part, but in memory it's two 128-bit parts.
>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>          of each unit.  */
>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>            < GET_MODE_NUNITS (xmode))
>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>         {
>>>           info->representable_p = false;
>>>           rknown = true;
>>>         }
>>>
>>> and I wouldn't really want to force targets to individually reproduce
>>> that kind of logic at the class level.  If the worst comes to the worst
>>> we could cache the difficult cases.
>>>
>>
>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>> to know if the subreg byte is zero or not.  It doesn't care about mode
>> padding.  You are concerned about information passed to
>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>> to process.  It isn't the case for x86.
>
> No, I'm concerned that by going this route, we're forcing every target
> (or at least every target with wider-than-word registers, which is most
> of the common ones) to implement the same target-independent restriction.
> This is not an x86-specific issue.
>

So you prefer a generic solution which makes
CANNOT_CHANGE_MODE_CLASS return true
for vector mode subreg if subreg byte != 0. Is this
correct?

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 16:09                                                                     ` H.J. Lu
@ 2013-12-11 16:26                                                                       ` Tejas Belagod
  2013-12-11 16:35                                                                         ` H.J. Lu
  0 siblings, 1 reply; 76+ messages in thread
From: Tejas Belagod @ 2013-12-11 16:26 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Kirill Yukhin, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

H.J. Lu wrote:
> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Henderson <rth@redhat.com> writes:
>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>> anything other than 0 in that case.
>>>>>>
>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>> something that is always invalid, regardless of the target.  That kind
>>>>>> of situation should be rejected by target-independent code instead.
>>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>>> will be allocated to a single hard register.  That is something that we can't
>>>>> know in target-independent code before register allocation.
>>>> I was thinking that if we've got a class, we've also got things like
>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>> be derived from other information.
>>>>
>>>> subreg_get_info handles padding with:
>>>>
>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>         xmode_unit = xmode;
>>>>       else
>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>       gcc_assert (nregs_xmode
>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>
>>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>>          picking a different register class, or doing it in memory if
>>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>>          of each unit.  */
>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>            < GET_MODE_NUNITS (xmode))
>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>         {
>>>>           info->representable_p = false;
>>>>           rknown = true;
>>>>         }
>>>>
>>>> and I wouldn't really want to force targets to individually reproduce
>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>> we could cache the difficult cases.
>>>>
>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>> padding.  You are concerned about information passed to
>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>> to process.  It isn't the case for x86.
>> No, I'm concerned that by going this route, we're forcing every target
>> (or at least every target with wider-than-word registers, which is most
>> of the common ones) to implement the same target-independent restriction.
>> This is not an x86-specific issue.
>>
> 
> So you prefer a generic solution which makes
> CANNOT_CHANGE_MODE_CLASS return true
> for vector mode subreg if subreg byte != 0. Is this
> correct?

Do you mean a generic solution for C_C_M_C to return true for non-zero 
byte_offset vector subregs in the context of x86?

I want to clarify because in the context of 32-bit ARM little-endian, a non-zero 
byte-offset vector subreg is still a valid full hardreg. eg. for

    (subreg:DI (reg:V4SF) 8)

C_C_M_C can return 'false' as this can be resolved to a full D-reg.

Thanks,
Tejas.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 16:26                                                                       ` Tejas Belagod
@ 2013-12-11 16:35                                                                         ` H.J. Lu
  2013-12-11 16:45                                                                           ` Tejas Belagod
  0 siblings, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-11 16:35 UTC (permalink / raw)
  To: Tejas Belagod
  Cc: Richard Henderson, Kirill Yukhin, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
> H.J. Lu wrote:
>>
>> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>>
>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>
>>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>>
>>>>> Richard Henderson <rth@redhat.com> writes:
>>>>>>
>>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>>>
>>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>>> anything other than 0 in that case.
>>>>>>>
>>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>>> something that is always invalid, regardless of the target.  That
>>>>>>> kind
>>>>>>> of situation should be rejected by target-independent code instead.
>>>>>>
>>>>>> But, we want to disable the subreg before we know whether or not
>>>>>> (reg:V4SF X)
>>>>>> will be allocated to a single hard register.  That is something that
>>>>>> we can't
>>>>>> know in target-independent code before register allocation.
>>>>>
>>>>> I was thinking that if we've got a class, we've also got things like
>>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>>> be derived from other information.
>>>>>
>>>>> subreg_get_info handles padding with:
>>>>>
>>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>>         xmode_unit = xmode;
>>>>>       else
>>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>>       gcc_assert (nregs_xmode
>>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno,
>>>>> xmode_unit)));
>>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>>
>>>>>       /* You can only ask for a SUBREG of a value with holes in the
>>>>> middle
>>>>>          if you don't cross the holes.  (Such a SUBREG should be done
>>>>> by
>>>>>          picking a different register class, or doing it in memory if
>>>>>          necessary.)  An example of a value with holes is XCmode on
>>>>> 32-bit
>>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit
>>>>> registers,
>>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>>          Padding is assumed to be at the end (not necessarily the 'high
>>>>> part')
>>>>>          of each unit.  */
>>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>>            < GET_MODE_NUNITS (xmode))
>>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>>         {
>>>>>           info->representable_p = false;
>>>>>           rknown = true;
>>>>>         }
>>>>>
>>>>> and I wouldn't really want to force targets to individually reproduce
>>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>>> we could cache the difficult cases.
>>>>>
>>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>>> padding.  You are concerned about information passed to
>>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>>> to process.  It isn't the case for x86.
>>>
>>> No, I'm concerned that by going this route, we're forcing every target
>>> (or at least every target with wider-than-word registers, which is most
>>> of the common ones) to implement the same target-independent restriction.
>>> This is not an x86-specific issue.
>>>
>>
>> So you prefer a generic solution which makes
>> CANNOT_CHANGE_MODE_CLASS return true
>> for vector mode subreg if subreg byte != 0. Is this
>> correct?
>
>
> Do you mean a generic solution for C_C_M_C to return true for non-zero
> byte_offset vector subregs in the context of x86?
>
> I want to clarify because in the context of 32-bit ARM little-endian, a
> non-zero byte-offset vector subreg is still a valid full hardreg. eg. for
>
>    (subreg:DI (reg:V4SF) 8)
>
> C_C_M_C can return 'false' as this can be resolved to a full D-reg.
>

Does that mean subreg byte interpretation is endian-dependent?
Both llittle endian

subreg:DI (reg:V4SF) 0)

and big endian

subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)

refer to the same lower 64 bits of reg:V4SF.  Is this correct?

-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 16:35                                                                         ` H.J. Lu
@ 2013-12-11 16:45                                                                           ` Tejas Belagod
  0 siblings, 0 replies; 76+ messages in thread
From: Tejas Belagod @ 2013-12-11 16:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Henderson, Kirill Yukhin, Yukhin, Kirill, Jeff Law,
	Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

H.J. Lu wrote:
> On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod <tbelagod@arm.com> wrote:
>> H.J. Lu wrote:
>>> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Henderson <rth@redhat.com> writes:
>>>>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>>>>> anything other than 0 in that case.
>>>>>>>>
>>>>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>>>>> something that is always invalid, regardless of the target.  That
>>>>>>>> kind
>>>>>>>> of situation should be rejected by target-independent code instead.
>>>>>>> But, we want to disable the subreg before we know whether or not
>>>>>>> (reg:V4SF X)
>>>>>>> will be allocated to a single hard register.  That is something that
>>>>>>> we can't
>>>>>>> know in target-independent code before register allocation.
>>>>>> I was thinking that if we've got a class, we've also got things like
>>>>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>>>>> But even in the padding cases an offset-based check in C_C_M_C could
>>>>>> be derived from other information.
>>>>>>
>>>>>> subreg_get_info handles padding with:
>>>>>>
>>>>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>>>>         xmode_unit = xmode;
>>>>>>       else
>>>>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>>>>       gcc_assert (nregs_xmode
>>>>>>                   == (GET_MODE_NUNITS (xmode)
>>>>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno,
>>>>>> xmode_unit)));
>>>>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>>>>                       * GET_MODE_NUNITS (xmode)));
>>>>>>
>>>>>>       /* You can only ask for a SUBREG of a value with holes in the
>>>>>> middle
>>>>>>          if you don't cross the holes.  (Such a SUBREG should be done
>>>>>> by
>>>>>>          picking a different register class, or doing it in memory if
>>>>>>          necessary.)  An example of a value with holes is XCmode on
>>>>>> 32-bit
>>>>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit
>>>>>> registers,
>>>>>>          3 for each part, but in memory it's two 128-bit parts.
>>>>>>          Padding is assumed to be at the end (not necessarily the 'high
>>>>>> part')
>>>>>>          of each unit.  */
>>>>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>>>>            < GET_MODE_NUNITS (xmode))
>>>>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>>>>         {
>>>>>>           info->representable_p = false;
>>>>>>           rknown = true;
>>>>>>         }
>>>>>>
>>>>>> and I wouldn't really want to force targets to individually reproduce
>>>>>> that kind of logic at the class level.  If the worst comes to the worst
>>>>>> we could cache the difficult cases.
>>>>>>
>>>>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>>>>> to know if the subreg byte is zero or not.  It doesn't care about mode
>>>>> padding.  You are concerned about information passed to
>>>>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>>>>> to process.  It isn't the case for x86.
>>>> No, I'm concerned that by going this route, we're forcing every target
>>>> (or at least every target with wider-than-word registers, which is most
>>>> of the common ones) to implement the same target-independent restriction.
>>>> This is not an x86-specific issue.
>>>>
>>> So you prefer a generic solution which makes
>>> CANNOT_CHANGE_MODE_CLASS return true
>>> for vector mode subreg if subreg byte != 0. Is this
>>> correct?
>>
>> Do you mean a generic solution for C_C_M_C to return true for non-zero
>> byte_offset vector subregs in the context of x86?
>>
>> I want to clarify because in the context of 32-bit ARM little-endian, a
>> non-zero byte-offset vector subreg is still a valid full hardreg. eg. for
>>
>>    (subreg:DI (reg:V4SF) 8)
>>
>> C_C_M_C can return 'false' as this can be resolved to a full D-reg.
>>
> 
> Does that mean subreg byte interpretation is endian-dependent?
> Both llittle endian
> 
> subreg:DI (reg:V4SF) 0)
> 
> and big endian
> 
> subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
> 
> refer to the same lower 64 bits of reg:V4SF.  Is this correct?
> 

If my understanding of endianness representation in RTL registers is correct, yes.

I said little-endian because C_C_M_C is currently gated on TARGET_BIG_ENDIAN in 
arm.h.

Thanks,
Tejas.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Patch, RTL] Eliminate redundant vec_select moves.
  2013-12-11 15:49                                                                   ` Richard Sandiford
  2013-12-11 16:09                                                                     ` H.J. Lu
@ 2013-12-14 16:32                                                                     ` H.J. Lu
  2013-12-14 22:44                                                                       ` RFC: PATCH: Add subreg_byte to REG_CANNOT_CHANGE_MODE_P H.J. Lu
  1 sibling, 1 reply; 76+ messages in thread
From: H.J. Lu @ 2013-12-14 16:32 UTC (permalink / raw)
  To: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Henderson <rth@redhat.com> writes:
>>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>>>> a single register.  On a little-endian target, the offset cannot be
>>>>> anything other than 0 in that case.
>>>>>
>>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>>>> something that is always invalid, regardless of the target.  That kind
>>>>> of situation should be rejected by target-independent code instead.
>>>>
>>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>>>> will be allocated to a single hard register.  That is something that we can't
>>>> know in target-independent code before register allocation.
>>>
>>> I was thinking that if we've got a class, we've also got things like
>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>> But even in the padding cases an offset-based check in C_C_M_C could
>>> be derived from other information.
>>>
>>> subreg_get_info handles padding with:
>>>
>>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>       if (GET_MODE_INNER (xmode) == VOIDmode)
>>>         xmode_unit = xmode;
>>>       else
>>>         xmode_unit = GET_MODE_INNER (xmode);
>>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>       gcc_assert (nregs_xmode
>>>                   == (GET_MODE_NUNITS (xmode)
>>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>>       gcc_assert (hard_regno_nregs[xregno][xmode]
>>>                   == (hard_regno_nregs[xregno][xmode_unit]
>>>                       * GET_MODE_NUNITS (xmode)));
>>>
>>>       /* You can only ask for a SUBREG of a value with holes in the middle
>>>          if you don't cross the holes.  (Such a SUBREG should be done by
>>>          picking a different register class, or doing it in memory if
>>>          necessary.)  An example of a value with holes is XCmode on 32-bit
>>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
>>>          3 for each part, but in memory it's two 128-bit parts.
>>>          Padding is assumed to be at the end (not necessarily the 'high part')
>>>          of each unit.  */
>>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>            < GET_MODE_NUNITS (xmode))
>>>           && (offset / GET_MODE_SIZE (xmode_unit)
>>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>                   / GET_MODE_SIZE (xmode_unit))))
>>>         {
>>>           info->representable_p = false;
>>>           rknown = true;
>>>         }
>>>
>>> and I wouldn't really want to force targets to individually reproduce
>>> that kind of logic at the class level.  If the worst comes to the worst
>>> we could cache the difficult cases.
>>>
>>
>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>> to know if the subreg byte is zero or not.  It doesn't care about mode
>> padding.  You are concerned about information passed to
>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>> to process.  It isn't the case for x86.
>
> No, I'm concerned that by going this route, we're forcing every target
> (or at least every target with wider-than-word registers, which is most
> of the common ones) to implement the same target-independent restriction.
> This is not an x86-specific issue.
>

It may not be x86 specific. However, the decision is made
based on enum reg_class:

/* Return true if the registers in CLASS cannot represent the change from
   modes FROM at offset SUBREG_BYTE to TO.  */

bool
ix86_cannot_change_mode_class (enum machine_mode from,
                               unsigned int subreg_byte,
                               enum machine_mode to,
                               enum reg_class regclass)
{
  if (from == to)
    return false;

  /* x87 registers can't do subreg at all, as all values are reformatted
     to extended precision.  */
  if (MAYBE_FLOAT_CLASS_P (regclass))
    return true;

  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
    {
      /* Vector registers do not support QI or HImode loads.  If we don't
         disallow a change to these modes, reload will assume it's ok to
         drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
         the vec_dupv4hi pattern.  */
      if (GET_MODE_SIZE (from) < 4)
        return true;

      /* Vector registers do not support subreg with nonzero offsets, which
         are otherwise valid for integer registers.  */
      if (subreg_byte != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
        return true;
    }

  return false;
}

We check subreg_byte only for SSE or MMX register classes.
We could add a target-independent hook or add subreg_byte to
CANNOT_CHANGE_MODE_CLASS like my patch does.


-- 
H.J.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RFC: PATCH: Add subreg_byte to REG_CANNOT_CHANGE_MODE_P
  2013-12-14 16:32                                                                     ` H.J. Lu
@ 2013-12-14 22:44                                                                       ` H.J. Lu
  0 siblings, 0 replies; 76+ messages in thread
From: H.J. Lu @ 2013-12-14 22:44 UTC (permalink / raw)
  To: Richard Henderson, Kirill Yukhin, Tejas Belagod, Yukhin, Kirill,
	Jeff Law, Bill Schmidt, gcc-patches, Uros Bizjak, Jakub Jelinek,
	Richard Sandiford

On Sat, Dec 14, 2013 at 08:32:25AM -0800, H.J. Lu wrote:
> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
> > "H.J. Lu" <hjl.tools@gmail.com> writes:
> >> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
> >> <rdsandiford@googlemail.com> wrote:
> >>> Richard Henderson <rth@redhat.com> writes:
> >>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
> >>>>> Sorry, I don't understand.  I never said it was invalid.  I said
> >>>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
> >>>>> a single register.  On a little-endian target, the offset cannot be
> >>>>> anything other than 0 in that case.
> >>>>>
> >>>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
> >>>>> something that is always invalid, regardless of the target.  That kind
> >>>>> of situation should be rejected by target-independent code instead.
> >>>>
> >>>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
> >>>> will be allocated to a single hard register.  That is something that we can't
> >>>> know in target-independent code before register allocation.
> >>>
> >>> I was thinking that if we've got a class, we've also got things like
> >>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
> >>> But even in the padding cases an offset-based check in C_C_M_C could
> >>> be derived from other information.
> >>>
> >>> subreg_get_info handles padding with:
> >>>
> >>>       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
> >>>       if (GET_MODE_INNER (xmode) == VOIDmode)
> >>>         xmode_unit = xmode;
> >>>       else
> >>>         xmode_unit = GET_MODE_INNER (xmode);
> >>>       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
> >>>       gcc_assert (nregs_xmode
> >>>                   == (GET_MODE_NUNITS (xmode)
> >>>                       * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
> >>>       gcc_assert (hard_regno_nregs[xregno][xmode]
> >>>                   == (hard_regno_nregs[xregno][xmode_unit]
> >>>                       * GET_MODE_NUNITS (xmode)));
> >>>
> >>>       /* You can only ask for a SUBREG of a value with holes in the middle
> >>>          if you don't cross the holes.  (Such a SUBREG should be done by
> >>>          picking a different register class, or doing it in memory if
> >>>          necessary.)  An example of a value with holes is XCmode on 32-bit
> >>>          x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
> >>>          3 for each part, but in memory it's two 128-bit parts.
> >>>          Padding is assumed to be at the end (not necessarily the 'high part')
> >>>          of each unit.  */
> >>>       if ((offset / GET_MODE_SIZE (xmode_unit) + 1
> >>>            < GET_MODE_NUNITS (xmode))
> >>>           && (offset / GET_MODE_SIZE (xmode_unit)
> >>>               != ((offset + GET_MODE_SIZE (ymode) - 1)
> >>>                   / GET_MODE_SIZE (xmode_unit))))
> >>>         {
> >>>           info->representable_p = false;
> >>>           rknown = true;
> >>>         }
> >>>
> >>> and I wouldn't really want to force targets to individually reproduce
> >>> that kind of logic at the class level.  If the worst comes to the worst
> >>> we could cache the difficult cases.
> >>>
> >>
> >> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
> >> to know if the subreg byte is zero or not.  It doesn't care about mode
> >> padding.  You are concerned about information passed to
> >> CANNOT_CHANGE_MODE_CLASS is too expensive for target
> >> to process.  It isn't the case for x86.
> >
> > No, I'm concerned that by going this route, we're forcing every target
> > (or at least every target with wider-than-word registers, which is most
> > of the common ones) to implement the same target-independent restriction.
> > This is not an x86-specific issue.
> >
> 
> It may not be x86 specific. However, the decision is made
> based on enum reg_class:
> 
> /* Return true if the registers in CLASS cannot represent the change from
>    modes FROM at offset SUBREG_BYTE to TO.  */
> 
> bool
> ix86_cannot_change_mode_class (enum machine_mode from,
>                                unsigned int subreg_byte,
>                                enum machine_mode to,
>                                enum reg_class regclass)
> {
>   if (from == to)
>     return false;
> 
>   /* x87 registers can't do subreg at all, as all values are reformatted
>      to extended precision.  */
>   if (MAYBE_FLOAT_CLASS_P (regclass))
>     return true;
> 
>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
>     {
>       /* Vector registers do not support QI or HImode loads.  If we don't
>          disallow a change to these modes, reload will assume it's ok to
>          drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>          the vec_dupv4hi pattern.  */
>       if (GET_MODE_SIZE (from) < 4)
>         return true;
> 
>       /* Vector registers do not support subreg with nonzero offsets, which
>          are otherwise valid for integer registers.  */
>       if (subreg_byte != 0 && GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
>         return true;
>     }
> 
>   return false;
> }
> 
> We check subreg_byte only for SSE or MMX register classes.
> We could add a target-independent hook or add subreg_byte to
> CANNOT_CHANGE_MODE_CLASS like my patch does.
> 

Here is a patch.  It introduces a new macro, MAYBE_VECTOR_CLASS_P,
to check if a class of registers may be vector registers.  It adds
CANNOT_CHANGE_MODE_CLASS_P and replace usage of
CANNOT_CHANGE_MODE_CLASS with CANNOT_CHANGE_MODE_CLASS_P, which takes
subreg_byte and always returns true for vector class and non-zero
subreg_byte.  I use MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT indicate
unknown subreg_byte.  Tested on Linux/x86-64 without regressions
using -m32, -m64 and -mx32.  Any comments?

Thanks.


H.J.
---
2013-12-14   H.J. Lu  <hongjiu.lu@intel.com>

	* combine.c (subst): Pass subreg byte to REG_CANNOT_CHANGE_MODE_P.
	(simplify_set): Likewise.
	* emit-rtl.c (validate_subreg): Likewise.
	* recog.c (register_operand): Likewise.
	* rtlanal.c (simplify_subreg_regno): Likewise.
	* defaults.h (MAYBE_VECTOR_CLASS_P): New macro.
	* hard-reg-set.h (CANNOT_CHANGE_MODE_CLASS_P): New macro.
	(REG_CANNOT_CHANGE_MODE_P): Add SUBREG_BYTE and pass it to
	CANNOT_CHANGE_MODE_CLASS_P.
	* postreload.c (reload_cse_simplify_set): Pass subreg byte to
	CANNOT_CHANGE_MODE_CLASS_P.
	(reload_cse_simplify_operands): Likewise.
	* reload.c (push_reload): Likewise.
	* regcprop.c (mode_change_ok): Pass unknown subreg byte to
	REG_CANNOT_CHANGE_MODE_P.
	* reginfo.c (record_subregs_of_mode): Likewise.
	* reload1.c (choose_reload_regs): Pass subreg byte to
	REG_CANNOT_CHANGE_MODE_P.
	(inherit_piecemeal_p): Pass unknown subreg byte to
	REG_CANNOT_CHANGE_MODE_P.
	* config/i386/i386.c (ix86_cannot_change_mode_class): Don't
	check mode size.
	* config/i386/i386.h (MAYBE_VECTOR_CLASS_P): New macro.
	* doc/tm.texi.in: Document MAYBE_VECTOR_CLASS_P.
	* doc/tm.texi: Regenerated.

diff --git a/gcc/combine.c b/gcc/combine.c
index dea6c28..8e3b962 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5084,6 +5084,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		      && REGNO (to) < FIRST_PSEUDO_REGISTER
 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
 						   GET_MODE (to),
+						   SUBREG_BYTE (x),
 						   GET_MODE (x)))
 		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 #endif
@@ -6450,6 +6451,7 @@ simplify_set (rtx x)
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
+					 SUBREG_BYTE (src),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ecf5e0b..8d71ffb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35057,13 +35057,6 @@ ix86_cannot_change_mode_class (enum machine_mode from, enum machine_mode to,
 	 the vec_dupv4hi pattern.  */
       if (GET_MODE_SIZE (from) < 4)
 	return true;
-
-      /* Vector registers do not support subreg with nonzero offsets, which
-	 are otherwise valid for integer registers.  Since we can't see
-	 whether we have a nonzero offset from here, prohibit all
-         nonparadoxical subregs changing size.  */
-      if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
-	return true;
     }
 
   return false;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7efd1e0..d6156d7 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1522,10 +1522,15 @@ enum reg_class
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
-/* Return a class of registers that cannot change FROM mode to TO mode.  */
+/* Return true if the registers in CLASS cannot represent the change
+   from mode FROM to mode TO.  */
 
 #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
   ix86_cannot_change_mode_class (FROM, TO, CLASS)
+
+/* Return true if the register CLASS may be a vector class.  */
+#define MAYBE_VECTOR_CLASS_P(CLASS) \
+  (MAYBE_SSE_CLASS_P (CLASS) || MAYBE_MMX_CLASS_P (CLASS))
 \f
 /* Stack layout; function entry, exit and calling.  */
 
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 1d12aef..e7fbf27 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1388,6 +1388,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define SWITCHABLE_TARGET 0
 #endif
 
+#ifndef MAYBE_VECTOR_CLASS_P
+#define MAYBE_VECTOR_CLASS_P(CLASS) false
+#endif
+
 #endif /* GCC_INSN_FLAGS_H  */
 
 #endif  /* ! GCC_DEFAULTS_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8abb3ef..94757c1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2898,6 +2898,20 @@ as below:
 @end smallexample
 @end defmac
 
+@defmac MAYBE_VECTOR_CLASS_P (@var{class})
+A C expression that returns @code{true} for a @var{class} of registers
+which may be vector registers.  Defaults to @code{false}.
+
+For the example, on x86 system, MMX and SSE registers are vector
+registers.  Therefore, @file{i386.h} defines @code{MAYBE_VECTOR_CLASS_P}
+as below:
+
+@smallexample
+#define MAYBE_VECTOR_CLASS_P(CLASS) \
+  (MAYBE_SSE_CLASS_P (CLASS) || MAYBE_MMX_CLASS_P (CLASS))
+@end smallexample
+@end defmac
+
 @deftypefn {Target Hook} bool TARGET_LRA_P (void)
 A target hook which returns true if we use LRA instead of reload pass.  It means that LRA was ported to the target.    The default version of this target hook returns always false.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index deedb41..6ca29f8 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2539,6 +2539,20 @@ as below:
 @end smallexample
 @end defmac
 
+@defmac MAYBE_VECTOR_CLASS_P (@var{class})
+A C expression that returns @code{true} for a @var{class} of registers
+which may be vector registers.  Defaults to @code{false}.
+
+For the example, on x86 system, MMX and SSE registers are vector
+registers.  Therefore, @file{i386.h} defines @code{MAYBE_VECTOR_CLASS_P}
+as below:
+
+@smallexample
+#define MAYBE_VECTOR_CLASS_P(CLASS) \
+  (MAYBE_SSE_CLASS_P (CLASS) || MAYBE_MMX_CLASS_P (CLASS))
+@end smallexample
+@end defmac
+
 @hook TARGET_LRA_P
 
 @hook TARGET_REGISTER_PRIORITY
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index d7fa3a5..b8e3dfd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -748,7 +748,7 @@ validate_subreg (enum machine_mode omode, enum machine_mode imode,
       if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	  && GET_MODE_INNER (imode) == omode)
 	;
-      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, omode))
+      else if (REG_CANNOT_CHANGE_MODE_P (regno, imode, offset, omode))
 	return false;
 #endif
 
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index ad987f9..d488a33 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -716,9 +716,17 @@ extern struct target_hard_regs *this_target_hard_regs;
 
 extern const char * reg_class_names[];
 
-/* Given a hard REGN a FROM mode and a TO mode, return nonzero if
-   REGN cannot change modes between the specified modes.  */
-#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, TO)                          \
-         CANNOT_CHANGE_MODE_CLASS (FROM, TO, REGNO_REG_CLASS (REGN))
+/* Return true if the registers in CLASS cannot represent the change
+   from mode FROM at offset SUBREG_BYTE to mode TO.  */
+#define CANNOT_CHANGE_MODE_CLASS_P(FROM, SUBREG_BYTE, TO, CLASS) \
+  ((MAYBE_VECTOR_CLASS_P (CLASS)				 \
+    && (SUBREG_BYTE) != 0					 \
+    && GET_MODE_SIZE (TO) < GET_MODE_SIZE (FROM))		 \
+   || CANNOT_CHANGE_MODE_CLASS (FROM, TO, CLASS))
+
+/* Given a hard REGN a FROM mode at SUBREG_BYTE and a TO mode, return
+   true if REGN cannot change modes between the specified modes.  */
+#define REG_CANNOT_CHANGE_MODE_P(REGN, FROM, SUBREG_BYTE, TO) \
+  CANNOT_CHANGE_MODE_CLASS_P (FROM, SUBREG_BYTE, TO, REGNO_REG_CLASS (REGN))
 
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/postreload.c b/gcc/postreload.c
index 37bd9ff..a3629c7 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -348,9 +348,11 @@ reload_cse_simplify_set (rtx set, rtx insn)
 	  if (GET_MODE_BITSIZE (GET_MODE (SET_DEST (set))) < BITS_PER_WORD
 	      && extend_op != UNKNOWN
 #ifdef CANNOT_CHANGE_MODE_CLASS
-	      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
-					    word_mode,
-					    REGNO_REG_CLASS (REGNO (SET_DEST (set))))
+	      && !CANNOT_CHANGE_MODE_CLASS_P (GET_MODE (SET_DEST (set)),
+					      (GET_CODE (SET_DEST (set)) == SUBREG
+					       ? SUBREG_BYTE (SET_DEST (set)) : 0),
+					      word_mode,
+					      REGNO_REG_CLASS (REGNO (SET_DEST (set))))
 #endif
 	      )
 	    {
@@ -458,9 +460,11 @@ reload_cse_simplify_operands (rtx insn, rtx testreg)
 	  /* If the register cannot change mode to word_mode, it follows that
 	     it cannot have been used in word_mode.  */
 	  else if (REG_P (SET_DEST (set))
-		   && CANNOT_CHANGE_MODE_CLASS (GET_MODE (SET_DEST (set)),
-						word_mode,
-						REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
+		   && CANNOT_CHANGE_MODE_CLASS_P (GET_MODE (SET_DEST (set)),
+						  (GET_CODE (SET_DEST (set)) == SUBREG
+						   ? SUBREG_BYTE (SET_DEST (set)) : 0),
+						  word_mode,
+						  REGNO_REG_CLASS (REGNO (SET_DEST (set)))))
 	    ; /* Continue ordinary processing.  */
 #endif
 	  /* If this is a straight load, make the extension explicit.  */
diff --git a/gcc/recog.c b/gcc/recog.c
index dbd9a8a..e30d81c 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1069,7 +1069,8 @@ register_operand (rtx op, enum machine_mode mode)
 #ifdef CANNOT_CHANGE_MODE_CLASS
       if (REG_P (sub)
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
-	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
+	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub),
+				       SUBREG_BYTE (op), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
 	  /* LRA can generate some invalid SUBREGS just for matched
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 3c9ef3d..2be5774 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -389,7 +389,9 @@ mode_change_ok (enum machine_mode orig_mode, enum machine_mode new_mode,
     return false;
 
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode, new_mode);
+  return !REG_CANNOT_CHANGE_MODE_P (regno, orig_mode,
+				    (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				    new_mode);
 #endif
 
   return true;
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 46288eb..1f81227 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1221,8 +1221,9 @@ record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
       for (rclass = 0; rclass < N_REG_CLASSES; rclass++)
 	if (!bitmap_bit_p (invalid_mode_changes,
 			   regno * N_REG_CLASSES + rclass)
-	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
-					 mode, (enum reg_class) rclass))
+	    && CANNOT_CHANGE_MODE_CLASS_P (PSEUDO_REGNO_MODE (regno),
+					   (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					   mode, (enum reg_class) rclass))
 	  bitmap_set_bit (invalid_mode_changes,
 			  regno * N_REG_CLASSES + rclass);
     }
diff --git a/gcc/reload.c b/gcc/reload.c
index 96619f6..58a6143 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1064,7 +1064,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (in != 0 && GET_CODE (in) == SUBREG
       && (subreg_lowpart_p (in) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS_P (GET_MODE (SUBREG_REG (in)),
+				      SUBREG_BYTE (in), inmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
       && (CONSTANT_P (SUBREG_REG (in))
@@ -1113,7 +1114,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P
-	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)), inmode))
+	      (REGNO (SUBREG_REG (in)), GET_MODE (SUBREG_REG (in)),
+	       SUBREG_BYTE (in), inmode))
 #endif
 	  ))
     {
@@ -1174,7 +1176,8 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
   if (out != 0 && GET_CODE (out) == SUBREG
       && (subreg_lowpart_p (out) || strict_low)
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
+      && !CANNOT_CHANGE_MODE_CLASS_P (GET_MODE (SUBREG_REG (out)),
+				      SUBREG_BYTE (out), outmode, rclass)
 #endif
       && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
       && (CONSTANT_P (SUBREG_REG (out))
@@ -1209,6 +1212,7 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      && REG_CANNOT_CHANGE_MODE_P (REGNO (SUBREG_REG (out)),
 					   GET_MODE (SUBREG_REG (out)),
+					   SUBREG_BYTE (out),
 					   outmode))
 #endif
 	  ))
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 47439ce..10d5a4e 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6609,7 +6609,7 @@ choose_reload_regs (struct insn_chain *chain)
 		     mode MODE.  */
 		  && !REG_CANNOT_CHANGE_MODE_P (REGNO (reg_last_reload_reg[regno]),
 						GET_MODE (reg_last_reload_reg[regno]),
-						mode)
+						byte, mode)
 #endif
 		  )
 		{
@@ -8080,8 +8080,12 @@ inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
 		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
-	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode,
+				     (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+				     reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode,
+					(MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT),
+					reg_raw_mode[src]));
 #else
   return true;
 #endif
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 38f9e36..9687110 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3533,7 +3533,7 @@ simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode,
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+      && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, offset, ymode)
       /* We can use mode change in LRA for some transformations.  */
       && ! lra_in_progress)
     return -1;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] (was: [Patch, RTL] Eliminate redundant vec_select moves)
  2013-12-04 16:07                       ` Tejas Belagod
                                           ` (2 preceding siblings ...)
  2013-12-04 20:04                         ` Jeff Law
@ 2020-04-22 16:43                         ` Thomas Schwinge
  2020-04-22 17:01                           ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] Andrew Stubbs
  3 siblings, 1 reply; 76+ messages in thread
From: Thomas Schwinge @ 2020-04-22 16:43 UTC (permalink / raw)
  To: Tejas Belagod, belagod, gcc-patches, Richard Sandiford, Richard Biener
  Cc: Bill Schmidt, Andrew Stubbs, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]

Hi!

First: please be gentle: I don't speak RTL.  ;-) And second: it's been
some time.

On 2013-12-04T16:06:48+0000, Tejas Belagod <tbelagod@arm.com> wrote:
> gcc/
>       * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select
>       for overlapping register lanes.

This got committed to trunk in r205712.

> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1180,6 +1180,26 @@ set_noop_p (const_rtx set)
>        dst = SUBREG_REG (dst);
>      }
>
> +  /* It is a NOOP if destination overlaps with selected src vector
> +     elements.  */
> +  if (GET_CODE (src) == VEC_SELECT
> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
> +      && HARD_REGISTER_P (XEXP (src, 0))
> +      && HARD_REGISTER_P (dst))
> +    {
> +      int i;
> +      rtx par = XEXP (src, 1);
> +      rtx src0 = XEXP (src, 0);
> +      int c0 = INTVAL (XVECEXP (par, 0, 0));
> +      HOST_WIDE_INT offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
> +
> +      for (i = 1; i < XVECLEN (par, 0); i++)
> +     if (INTVAL (XVECEXP (par, 0, i)) != c0 + i)
> +       return 0;
> +      return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> +                                 offset, GET_MODE (dst)) == (int)REGNO (dst);
> +    }
> +
>    return (REG_P (src) && REG_P (dst)
>         && REGNO (src) == REGNO (dst));
>  }

In <https://gcc.gnu.org/PR94279> "[amdgcn] internal compiler error: RTL
check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at
rtl.h:2379", we recently found that that it's wrong to expect constant
selectors, at least in the current code and its usage context.  (Thanks,
Richard Biener for the guidance!)  Not too many actually, but of course,
this code has seen some changes since 2013-12-04 (for example, r261530
"Use poly_int rtx accessors instead of hwi accessors"), and also the
context may have changed that it's being used in -- so, I'm not sure
whether the original code (as quoted above) is actually buggy already,
but it already does contain the pattern that 'INTVAL' is used on
something without making sure that we're actually dealing with a constant
selector.  (Has that maybe have been an impossible scenario back then?)

Anyway.  Attached is a WIP patch "[rtl] Harden 'set_noop_p' for
non-constant selectors [PR94279]".  Richard Biener said that "A patch
like along that line is pre-approved", but given my illiterateness with
what I'm deal with here, I'd like that reviewed properly, please.  :-)
If approving this patch, please respond with "Reviewed-by: NAME <EMAIL>"
so that your effort will be recorded in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.

I'll schedule x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu
bootstrap testing.  What other testing does this need?  (Asking as this
seems to have been added for aarch64, which I'm not set up to test.)  So
far, I've only confirmed that it does solve the RTL checking issue with
libgomp AMD GCN offloading testing.

Then, should this also be backported to release branches?  GCC 9: same
patch as for master branch.  GCC 8: pre poly_int, so only need to guard
'INTVAL' (by 'CONST_INT_P', right?).  Or, is that not worth it, given
that nobody found this to be a problem until now (as far as I know),
and/or it's maybe really specific to (or, exposed by) AMD GCN's vector
instructions?  (For AMD GCN offloading, we only care about master
branch.)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-rtl-Harden-set_noop_p-for-non-constant-selectors-PR9.patch --]
[-- Type: text/x-diff, Size: 1217 bytes --]

From 3546ac8ef47cf67570834e5a70614907bef40304 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 22 Apr 2020 16:58:44 +0200
Subject: [PATCH] [rtl] Harden 'set_noop_p' for non-constant selectors
 [PR94279]

---
 gcc/rtlanal.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index c7ab86e228b1..0ebde7622db6 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1631,12 +1631,18 @@ set_noop_p (const_rtx set)
       int i;
       rtx par = XEXP (src, 1);
       rtx src0 = XEXP (src, 0);
-      poly_int64 c0 = rtx_to_poly_int64 (XVECEXP (par, 0, 0));
+      poly_int64 c0;
+      if (!poly_int_rtx_p (XVECEXP (par, 0, 0), &c0))
+	return 0;
       poly_int64 offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
 
       for (i = 1; i < XVECLEN (par, 0); i++)
-	if (maybe_ne (rtx_to_poly_int64 (XVECEXP (par, 0, i)), c0 + i))
-	  return 0;
+	{
+	  poly_int64 c0i;
+	  if (!poly_int_rtx_p (XVECEXP (par, 0, i), &c0i)
+	      || maybe_ne (c0i, c0 + i))
+	    return 0;
+	}
       return
 	REG_CAN_CHANGE_MODE_P (REGNO (dst), GET_MODE (src0), GET_MODE (dst))
 	&& simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
-- 
2.25.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279]
  2020-04-22 16:43                         ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] (was: [Patch, RTL] Eliminate redundant vec_select moves) Thomas Schwinge
@ 2020-04-22 17:01                           ` Andrew Stubbs
  2020-04-22 17:23                             ` Richard Sandiford
  0 siblings, 1 reply; 76+ messages in thread
From: Andrew Stubbs @ 2020-04-22 17:01 UTC (permalink / raw)
  To: Thomas Schwinge, Tejas Belagod, belagod, gcc-patches,
	Richard Sandiford, Richard Biener
  Cc: Bill Schmidt, Julian Brown

On 22/04/2020 17:43, Thomas Schwinge wrote:
> In <https://gcc.gnu.org/PR94279> "[amdgcn] internal compiler error: RTL
> check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at
> rtl.h:2379", we recently found that that it's wrong to expect constant
> selectors, at least in the current code and its usage context.  (Thanks,
> Richard Biener for the guidance!)  Not too many actually, but of course,
> this code has seen some changes since 2013-12-04 (for example, r261530
> "Use poly_int rtx accessors instead of hwi accessors"), and also the
> context may have changed that it's being used in -- so, I'm not sure
> whether the original code (as quoted above) is actually buggy already,
> but it already does contain the pattern that 'INTVAL' is used on
> something without making sure that we're actually dealing with a constant
> selector.  (Has that maybe have been an impossible scenario back then?)

I think it was impossible. See 
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00273.html

> Then, should this also be backported to release branches?  GCC 9: same
> patch as for master branch.  GCC 8: pre poly_int, so only need to guard
> 'INTVAL' (by 'CONST_INT_P', right?).  Or, is that not worth it, given
> that nobody found this to be a problem until now (as far as I know),
> and/or it's maybe really specific to (or, exposed by) AMD GCN's vector
> instructions?  (For AMD GCN offloading, we only care about master
> branch.)

I don't think it's needed prior to GCC 9, and then only for amdgcn which 
was probably not widely used.

Andrew

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279]
  2020-04-22 17:01                           ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] Andrew Stubbs
@ 2020-04-22 17:23                             ` Richard Sandiford
  2020-04-29  8:44                               ` Thomas Schwinge
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Sandiford @ 2020-04-22 17:23 UTC (permalink / raw)
  To: Andrew Stubbs
  Cc: Thomas Schwinge, gcc-patches, Richard Biener, Bill Schmidt, Julian Brown

Andrew Stubbs <ams@codesourcery.com> writes:
> On 22/04/2020 17:43, Thomas Schwinge wrote:
>> In <https://gcc.gnu.org/PR94279> "[amdgcn] internal compiler error: RTL
>> check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at
>> rtl.h:2379", we recently found that that it's wrong to expect constant
>> selectors, at least in the current code and its usage context.  (Thanks,
>> Richard Biener for the guidance!)  Not too many actually, but of course,
>> this code has seen some changes since 2013-12-04 (for example, r261530
>> "Use poly_int rtx accessors instead of hwi accessors"), and also the
>> context may have changed that it's being used in -- so, I'm not sure
>> whether the original code (as quoted above) is actually buggy already,
>> but it already does contain the pattern that 'INTVAL' is used on
>> something without making sure that we're actually dealing with a constant
>> selector.  (Has that maybe have been an impossible scenario back then?)
>
> I think it was impossible. See 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00273.html

Ah!  Thanks for the link.

>> Then, should this also be backported to release branches?  GCC 9: same
>> patch as for master branch.  GCC 8: pre poly_int, so only need to guard
>> 'INTVAL' (by 'CONST_INT_P', right?).  Or, is that not worth it, given
>> that nobody found this to be a problem until now (as far as I know),
>> and/or it's maybe really specific to (or, exposed by) AMD GCN's vector
>> instructions?  (For AMD GCN offloading, we only care about master
>> branch.)
>
> I don't think it's needed prior to GCC 9, and then only for amdgcn which 
> was probably not widely used.

Based on that, OK for master and GCC 9.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279]
  2020-04-22 17:23                             ` Richard Sandiford
@ 2020-04-29  8:44                               ` Thomas Schwinge
  0 siblings, 0 replies; 76+ messages in thread
From: Thomas Schwinge @ 2020-04-29  8:44 UTC (permalink / raw)
  To: Richard Sandiford, Andrew Stubbs, gcc-patches, belagod
  Cc: Richard Biener, Bill Schmidt, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

Hi!

On 2020-04-22T18:23:24+0100, Richard Sandiford <richard.sandiford@arm.com> wrote:
> Andrew Stubbs <ams@codesourcery.com> writes:
>> On 22/04/2020 17:43, Thomas Schwinge wrote:
>>> In <https://gcc.gnu.org/PR94279> "[amdgcn] internal compiler error: RTL
>>> check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at
>>> rtl.h:2379", we recently found that that it's wrong to expect constant
>>> selectors, at least in the current code and its usage context.  (Thanks,
>>> Richard Biener for the guidance!)  Not too many actually, but of course,
>>> this code has seen some changes since 2013-12-04 (for example, r261530
>>> "Use poly_int rtx accessors instead of hwi accessors"), and also the
>>> context may have changed that it's being used in -- so, I'm not sure
>>> whether the original code (as quoted above) is actually buggy already,
>>> but it already does contain the pattern that 'INTVAL' is used on
>>> something without making sure that we're actually dealing with a constant
>>> selector.  (Has that maybe have been an impossible scenario back then?)
>>
>> I think it was impossible. See
>> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00273.html
>
> Ah!  Thanks for the link.

Many thanks indeed! That gives confidence why we're running into this
problem just now, for GCN target only -- Tejas' original patch thus is
not to blame at all, good.

(..., and hopefully we won't find much more fall-out due to the GCN
target doing away with the constant 'vec_select' restriction...)

>>> Then, should this also be backported to release branches?  GCC 9: same
>>> patch as for master branch.  GCC 8: pre poly_int, so only need to guard
>>> 'INTVAL' (by 'CONST_INT_P', right?).  Or, is that not worth it, given
>>> that nobody found this to be a problem until now (as far as I know),
>>> and/or it's maybe really specific to (or, exposed by) AMD GCN's vector
>>> instructions?  (For AMD GCN offloading, we only care about master
>>> branch.)
>>
>> I don't think it's needed prior to GCC 9, and then only for amdgcn which
>> was probably not widely used.
>
> Based on that, OK for master and GCC 9.

Thanks for the quick review.  I'll later see about backporting for GCC 9.
For now pushed to master branch in commit
f2c2eaaf8fb5c66ae372bb526b2b2fe67a9c5c39 "[rtl] Harden 'set_noop_p' for
non-constant selectors [PR94279]", see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-rtl-Harden-set_noop_p-for-non-constant-selectors-PR9.patch --]
[-- Type: text/x-diff, Size: 1779 bytes --]

From f2c2eaaf8fb5c66ae372bb526b2b2fe67a9c5c39 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 22 Apr 2020 16:58:44 +0200
Subject: [PATCH] [rtl] Harden 'set_noop_p' for non-constant selectors
 [PR94279]

... given that the GCN target did away with the constant 'vec_select'
restriction.

	gcc/
	PR target/94279
	* rtlanal.c (set_noop_p): Handle non-constant selectors.
---
 gcc/ChangeLog |  3 +++
 gcc/rtlanal.c | 12 +++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2ba39f67200f..ef851ef84626 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2020-04-29  Thomas Schwinge  <thomas@codesourcery.com>
 
+	PR target/94279
+	* rtlanal.c (set_noop_p): Handle non-constant selectors.
+
 	PR target/94282
 	* common/config/gcn/gcn-common.c (gcn_except_unwind_info): New
 	function.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index c7ab86e228b1..0ebde7622db6 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1631,12 +1631,18 @@ set_noop_p (const_rtx set)
       int i;
       rtx par = XEXP (src, 1);
       rtx src0 = XEXP (src, 0);
-      poly_int64 c0 = rtx_to_poly_int64 (XVECEXP (par, 0, 0));
+      poly_int64 c0;
+      if (!poly_int_rtx_p (XVECEXP (par, 0, 0), &c0))
+	return 0;
       poly_int64 offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
 
       for (i = 1; i < XVECLEN (par, 0); i++)
-	if (maybe_ne (rtx_to_poly_int64 (XVECEXP (par, 0, i)), c0 + i))
-	  return 0;
+	{
+	  poly_int64 c0i;
+	  if (!poly_int_rtx_p (XVECEXP (par, 0, i), &c0i)
+	      || maybe_ne (c0i, c0 + i))
+	    return 0;
+	}
       return
 	REG_CAN_CHANGE_MODE_P (REGNO (dst), GET_MODE (src0), GET_MODE (dst))
 	&& simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
-- 
2.26.2


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2020-04-29  8:44 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-06 13:37 [Patch, RTL] Eliminate redundant vec_select moves Tejas Belagod
2013-11-06 14:07 ` Richard Biener
2013-11-06 16:45   ` Bill Schmidt
2013-11-06 17:07     ` Bill Schmidt
2013-11-06 14:25 ` Richard Sandiford
2013-11-06 15:37   ` Tejas Belagod
2013-11-06 16:14     ` Richard Sandiford
2013-11-06 17:11       ` Tejas Belagod
2013-11-06 18:34         ` Richard Sandiford
2013-11-06 19:42           ` Bill Schmidt
2013-11-07 14:37           ` Tejas Belagod
2013-11-07 15:15             ` Richard Sandiford
2013-11-07 18:05               ` Tejas Belagod
2013-11-10 20:59                 ` Richard Sandiford
2013-11-27 17:59                   ` Tejas Belagod
2013-11-28 11:33                     ` Richard Sandiford
2013-12-04 16:07                       ` Tejas Belagod
2013-12-04 16:14                         ` H.J. Lu
2013-12-04 17:29                           ` Jeff Law
2013-12-04 17:31                             ` H.J. Lu
2013-12-05 13:17                               ` Tejas Belagod
2013-12-05 13:30                                 ` H.J. Lu
2013-12-05 13:42                                   ` Kirill Yukhin
2013-12-09  6:51                                     ` Kirill Yukhin
2013-12-09  9:56                                       ` Tejas Belagod
2013-12-09 12:01                                         ` Richard Sandiford
2013-12-09 13:00                                         ` H.J. Lu
2013-12-09 13:49                                           ` H.J. Lu
2013-12-09 22:08                                             ` H.J. Lu
2013-12-10 14:53                                               ` Kirill Yukhin
2013-12-10 16:52                                                 ` Paul_Koning
2013-12-10 16:07                                               ` Kirill Yukhin
2013-12-10 16:24                                                 ` H.J. Lu
2013-12-10 17:07                                                   ` Kirill Yukhin
2013-12-10 17:14                                                     ` H.J. Lu
2013-12-10 17:26                                                       ` Tejas Belagod
2013-12-10 17:39                                                         ` H.J. Lu
2013-12-10 19:05                                                           ` Tejas Belagod
2013-12-10 19:12                                                             ` H.J. Lu
2013-12-10 19:52                                                               ` Paul_Koning
2013-12-10 17:57                                                   ` Richard Sandiford
2013-12-10 18:21                                                     ` H.J. Lu
2013-12-10 18:26                                                       ` Richard Sandiford
2013-12-10 18:33                                                         ` H.J. Lu
2013-12-10 18:45                                                           ` Richard Sandiford
2013-12-10 18:46                                                             ` H.J. Lu
2013-12-10 20:40                                                             ` Richard Henderson
2013-12-10 21:09                                                               ` H.J. Lu
2013-12-10 21:51                                                                 ` H.J. Lu
2013-12-10 22:25                                                                   ` Tejas Belagod
2013-12-10 22:33                                                                     ` H.J. Lu
2013-12-11  1:33                                                                       ` H.J. Lu
2013-12-11  9:14                                                               ` Richard Sandiford
2013-12-11 13:10                                                                 ` H.J. Lu
2013-12-11 15:49                                                                   ` Richard Sandiford
2013-12-11 16:09                                                                     ` H.J. Lu
2013-12-11 16:26                                                                       ` Tejas Belagod
2013-12-11 16:35                                                                         ` H.J. Lu
2013-12-11 16:45                                                                           ` Tejas Belagod
2013-12-14 16:32                                                                     ` H.J. Lu
2013-12-14 22:44                                                                       ` RFC: PATCH: Add subreg_byte to REG_CANNOT_CHANGE_MODE_P H.J. Lu
2013-12-10 17:02                                                 ` [Patch, RTL] Eliminate redundant vec_select moves H.J. Lu
2013-12-10 17:11                                                   ` Kirill Yukhin
2013-12-10 17:12                                                     ` H.J. Lu
2013-12-05 22:38                           ` Jakub Jelinek
2013-12-06 11:36                             ` Tejas Belagod
2013-12-06 17:12                             ` Tejas Belagod
2013-12-06 17:20                               ` Jakub Jelinek
2013-12-04 17:36                         ` Richard Sandiford
2013-12-04 20:04                         ` Jeff Law
2013-12-05 16:12                           ` Tejas Belagod
2013-12-05 16:20                             ` Jeff Law
2020-04-22 16:43                         ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] (was: [Patch, RTL] Eliminate redundant vec_select moves) Thomas Schwinge
2020-04-22 17:01                           ` [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279] Andrew Stubbs
2020-04-22 17:23                             ` Richard Sandiford
2020-04-29  8:44                               ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).