public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Mode change for bswap pattern expansion
@ 2014-01-27 11:28 Paulo Matos
  2014-01-27 16:24 ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Paulo Matos @ 2014-01-27 11:28 UTC (permalink / raw)
  To: gcc

Hello,

On a vector processor we can do a bswapsi with two instructions, by first rotating half-words (16 bits) by 8 and then rotating full words by 16. 
However, this means expanding:
(set (match_operand:SI 0 "register_operand" "")
     (bswap:SI (match_operand:SI 1 "register_operand" "")))

to:
(set (match_dup:V2HI 0)
     (rotate:V2HI (match_dup:V2HI 1)
                  (const_int 8)))
(set (match_dup:SI 0)
     (rotate:SI (match_dup:SI 0)
                (const_int 16)))

This is obviously not correct, because match_dup cannot set the mode. The point I am trying to make is that I can't find a good way to deal with the mode changes. I don't think GCC is too happy if I change the modes of the same operand from one instruction to the other right? The only other way is to emit paradoxical subregs. So something along these lines:
(set (subreg:V2HI (match_dup 0) 0)
     (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
                  (const_int 8)))
(set (match_dup 0)
     (rotate:SI (match_dup 0)
                (const_int 16)))

Is there a better way to handle a situation like this?

Cheers,

Paulo Matos



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Mode change for bswap pattern expansion
  2014-01-27 11:28 Mode change for bswap pattern expansion Paulo Matos
@ 2014-01-27 16:24 ` Richard Sandiford
  2014-01-27 16:28   ` Paulo Matos
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2014-01-27 16:24 UTC (permalink / raw)
  To: Paulo Matos; +Cc: gcc

Paulo Matos <pmatos@broadcom.com> writes:
> On a vector processor we can do a bswapsi with two instructions, by first rotating half-words (16 bits) by 8 and then rotating full words by 16. 
> However, this means expanding:
> (set (match_operand:SI 0 "register_operand" "")
>      (bswap:SI (match_operand:SI 1 "register_operand" "")))
>
> to:
> (set (match_dup:V2HI 0)
>      (rotate:V2HI (match_dup:V2HI 1)
>                   (const_int 8)))
> (set (match_dup:SI 0)
>      (rotate:SI (match_dup:SI 0)
>                 (const_int 16)))
>
> This is obviously not correct, because match_dup cannot set the mode. The point I am trying to make is that I can't find a good way to deal with the mode changes. I don't think GCC is too happy if I change the modes of the same operand from one instruction to the other right? The only other way is to emit paradoxical subregs. So something along these lines:
> (set (subreg:V2HI (match_dup 0) 0)
>      (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
>                   (const_int 8)))
> (set (match_dup 0)
>      (rotate:SI (match_dup 0)
>                 (const_int 16)))

It's usually better not to hard-code the subregs in the pattern.
Instead you could use C code to create the subregs, e.g.:

  [(set (match_dup 3)
        (rotate:V2HI (match_dup 2)
                     (const_int 8)))
   (set (match_dup 0)
        (rotate:SI (match_dup 4)
                   (const_int 16)))]
  ""
{
  operands[2] = gen_lowpart (V2HImode, operands[1]);
  operands[3] = gen_reg_rtx (V2HImode);
  operands[4] = gen_lowpart (SImode, operands[3]);
}

so that any hard regs are correctly handled.  Or it might be easier to code
it using emit_insn (gen_* (...))s instead.

BTW, paradoxical subregs are where the outer mode is strictly larger
than the inner mode.

MIPS uses essentially the same sequence, except that it has a special
instruction to do the first rotate (WSBH), rather than it being an instance
of a general vector rotate.  For MIPS we just model it as an unspec SImode
operation.  Maybe that would be easier here too.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Mode change for bswap pattern expansion
  2014-01-27 16:24 ` Richard Sandiford
@ 2014-01-27 16:28   ` Paulo Matos
  2014-01-27 17:00     ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Paulo Matos @ 2014-01-27 16:28 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc



> -----Original Message-----
> From: Richard Sandiford [mailto:rdsandiford@googlemail.com]
> Sent: 27 January 2014 16:06
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Mode change for bswap pattern expansion
> 
> Paulo Matos <pmatos@broadcom.com> writes:
> > On a vector processor we can do a bswapsi with two instructions, by first
> rotating half-words (16 bits) by 8 and then rotating full words by 16.
> > However, this means expanding:
> > (set (match_operand:SI 0 "register_operand" "")
> >      (bswap:SI (match_operand:SI 1 "register_operand" "")))
> >
> > to:
> > (set (match_dup:V2HI 0)
> >      (rotate:V2HI (match_dup:V2HI 1)
> >                   (const_int 8)))
> > (set (match_dup:SI 0)
> >      (rotate:SI (match_dup:SI 0)
> >                 (const_int 16)))
> >
> > This is obviously not correct, because match_dup cannot set the mode. The point
> I am trying to make is that I can't find a good way to deal with the mode
> changes. I don't think GCC is too happy if I change the modes of the same operand
> from one instruction to the other right? The only other way is to emit
> paradoxical subregs. So something along these lines:
> > (set (subreg:V2HI (match_dup 0) 0)
> >      (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
> >                   (const_int 8)))
> > (set (match_dup 0)
> >      (rotate:SI (match_dup 0)
> >                 (const_int 16)))
> 
> It's usually better not to hard-code the subregs in the pattern.
> Instead you could use C code to create the subregs, e.g.:
> 
>   [(set (match_dup 3)
>         (rotate:V2HI (match_dup 2)
>                      (const_int 8)))
>    (set (match_dup 0)
>         (rotate:SI (match_dup 4)
>                    (const_int 16)))]
>   ""
> {
>   operands[2] = gen_lowpart (V2HImode, operands[1]);
>   operands[3] = gen_reg_rtx (V2HImode);
>   operands[4] = gen_lowpart (SImode, operands[3]);
> }
> 
> so that any hard regs are correctly handled.  Or it might be easier to code
> it using emit_insn (gen_* (...))s instead.
> 
> BTW, paradoxical subregs are where the outer mode is strictly larger
> than the inner mode.
>

That's right. My mis-understanding.
 
> MIPS uses essentially the same sequence, except that it has a special
> instruction to do the first rotate (WSBH), rather than it being an instance
> of a general vector rotate.  For MIPS we just model it as an unspec SImode
> operation.  Maybe that would be easier here too.
> 

I will look at how MIPS is doing it.

However, the unspec SI has severe performance penalties on my port since it is able to issue more that one instruction per cycle, therefore having each instruction separately allows the scheduler to issue each of the bswapsi parts into different slots with other instructions.


Thanks,

Paulo Matos

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Mode change for bswap pattern expansion
  2014-01-27 16:28   ` Paulo Matos
@ 2014-01-27 17:00     ` Richard Sandiford
  2014-01-27 17:57       ` Paulo Matos
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2014-01-27 17:00 UTC (permalink / raw)
  To: Paulo Matos; +Cc: gcc

Paulo Matos <pmatos@broadcom.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:rdsandiford@googlemail.com]
>> Sent: 27 January 2014 16:06
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Mode change for bswap pattern expansion
>> 
>> Paulo Matos <pmatos@broadcom.com> writes:
>> > On a vector processor we can do a bswapsi with two instructions, by first
>> rotating half-words (16 bits) by 8 and then rotating full words by 16.
>> > However, this means expanding:
>> > (set (match_operand:SI 0 "register_operand" "")
>> >      (bswap:SI (match_operand:SI 1 "register_operand" "")))
>> >
>> > to:
>> > (set (match_dup:V2HI 0)
>> >      (rotate:V2HI (match_dup:V2HI 1)
>> >                   (const_int 8)))
>> > (set (match_dup:SI 0)
>> >      (rotate:SI (match_dup:SI 0)
>> >                 (const_int 16)))
>> >
>> > This is obviously not correct, because match_dup cannot set the mode. The point
>> I am trying to make is that I can't find a good way to deal with the mode
>> changes. I don't think GCC is too happy if I change the modes of the
>> same operand
>> from one instruction to the other right? The only other way is to emit
>> paradoxical subregs. So something along these lines:
>> > (set (subreg:V2HI (match_dup 0) 0)
>> >      (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
>> >                   (const_int 8)))
>> > (set (match_dup 0)
>> >      (rotate:SI (match_dup 0)
>> >                 (const_int 16)))
>> 
>> It's usually better not to hard-code the subregs in the pattern.
>> Instead you could use C code to create the subregs, e.g.:
>> 
>>   [(set (match_dup 3)
>>         (rotate:V2HI (match_dup 2)
>>                      (const_int 8)))
>>    (set (match_dup 0)
>>         (rotate:SI (match_dup 4)
>>                    (const_int 16)))]
>>   ""
>> {
>>   operands[2] = gen_lowpart (V2HImode, operands[1]);
>>   operands[3] = gen_reg_rtx (V2HImode);
>>   operands[4] = gen_lowpart (SImode, operands[3]);
>> }
>> 
>> so that any hard regs are correctly handled.  Or it might be easier to code
>> it using emit_insn (gen_* (...))s instead.
>> 
>> BTW, paradoxical subregs are where the outer mode is strictly larger
>> than the inner mode.
>>
>
> That's right. My mis-understanding.
>  
>> MIPS uses essentially the same sequence, except that it has a special
>> instruction to do the first rotate (WSBH), rather than it being an instance
>> of a general vector rotate.  For MIPS we just model it as an unspec SImode
>> operation.  Maybe that would be easier here too.
>> 
>
> I will look at how MIPS is doing it.
>
> However, the unspec SI has severe performance penalties on my port since
> it is able to issue more that one instruction per cycle, therefore
> having each instruction separately allows the scheduler to issue each of
> the bswapsi parts into different slots with other instructions.

Sorry, I meant we use an unspec for the first ("V2HI") rotate.
I.e. rather than:

  (set (subreg:V2HI (match_dup 2) 0)
       (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
                    (const_int 8)))
  (set (match_dup 0)
       (rotate:SI (match_dup 2)
                  (const_int 16)))

we have:

  (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
  (set (match_dup 0)
       (rotate:SI (match_dup 2)
                  (const_int 16)))

In your case the define_insn for the UNSPEC_FOO pattern would have the
same attributes as a V2HI rotate, so it should get scheduled in the same way.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Mode change for bswap pattern expansion
  2014-01-27 17:00     ` Richard Sandiford
@ 2014-01-27 17:57       ` Paulo Matos
  2014-01-27 19:29         ` pinskia
  0 siblings, 1 reply; 6+ messages in thread
From: Paulo Matos @ 2014-01-27 17:57 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

> -----Original Message-----
> From: Richard Sandiford [mailto:rsandifo@linux.vnet.ibm.com]
> Sent: 27 January 2014 16:50
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Mode change for bswap pattern expansion
>  
> Sorry, I meant we use an unspec for the first ("V2HI") rotate.
> I.e. rather than:
> 
>   (set (subreg:V2HI (match_dup 2) 0)
>        (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
>                     (const_int 8)))
>   (set (match_dup 0)
>        (rotate:SI (match_dup 2)
>                   (const_int 16)))
> 
> we have:
> 
>   (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
>   (set (match_dup 0)
>        (rotate:SI (match_dup 2)
>                   (const_int 16)))
> 
> In your case the define_insn for the UNSPEC_FOO pattern would have the
> same attributes as a V2HI rotate, so it should get scheduled in the same way.
> 

In that case it would work. My only concern would then be if it prevents further optimizations. On the other hand I am not sure if GCC would try to optimize a rotate with vector V2HI mode...
Might give both solutions a try and see what results I get.

Thanks,

Paulo Matos

> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Mode change for bswap pattern expansion
  2014-01-27 17:57       ` Paulo Matos
@ 2014-01-27 19:29         ` pinskia
  0 siblings, 0 replies; 6+ messages in thread
From: pinskia @ 2014-01-27 19:29 UTC (permalink / raw)
  To: Paulo Matos; +Cc: Richard Sandiford, gcc



On Jan 27, 2014, at 8:59 AM, Paulo Matos <pmatos@broadcom.com> wrote:

>> -----Original Message-----
>> From: Richard Sandiford [mailto:rsandifo@linux.vnet.ibm.com]
>> Sent: 27 January 2014 16:50
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Mode change for bswap pattern expansion
>> 
>> Sorry, I meant we use an unspec for the first ("V2HI") rotate.
>> I.e. rather than:
>> 
>>  (set (subreg:V2HI (match_dup 2) 0)
>>       (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
>>                    (const_int 8)))
>>  (set (match_dup 0)
>>       (rotate:SI (match_dup 2)
>>                  (const_int 16)))
>> 
>> we have:
>> 
>>  (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
>>  (set (match_dup 0)
>>       (rotate:SI (match_dup 2)
>>                  (const_int 16)))
>> 
>> In your case the define_insn for the UNSPEC_FOO pattern would have the
>> same attributes as a V2HI rotate, so it should get scheduled in the same way.
> 
> In that case it would work. My only concern would then be if it prevents further optimizations. On the other hand I am not sure if GCC would try to optimize a rotate with vector V2HI mode...
> Might give both solutions a try and see what results I get.

Maybe do a split after reload instead?

Thanks,
Andrew

> 
> Thanks,
> 
> Paulo Matos
> 
>> Thanks,
>> Richard
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-27 18:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-27 11:28 Mode change for bswap pattern expansion Paulo Matos
2014-01-27 16:24 ` Richard Sandiford
2014-01-27 16:28   ` Paulo Matos
2014-01-27 17:00     ` Richard Sandiford
2014-01-27 17:57       ` Paulo Matos
2014-01-27 19:29         ` pinskia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).