subreg vs vec

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* subreg vs vec_select
@ 2020-09-09  9:50 Ilya Leoshkevich
  2020-09-09 21:09 ` Segher Boessenkool
  0 siblings, 1 reply; 9+ messages in thread
From: Ilya Leoshkevich @ 2020-09-09  9:50 UTC (permalink / raw)
  To: gcc

Hi!

I have a vector pseudo containing a single 128-bit value (V1TFmode) and
I need to access its last 64 bits (DFmode). Which of the two options
is better?

(subreg:DF (reg:V1TF) 8)

or

(vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int 1)]))

If I use the first one, I run into a problem with set_noop_p (): it
thinks that

(set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))

is a no-op, because it doesn't check the mode after stripping the
subreg:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616

However this is not correct, because SET_DEST is the second register in
a register pair, and SET_SRC is half of a vector register that overlaps
the first register in the corresponding pair. So it looks as if mode
needs to be considered there.

This helps:

--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
        return 0;
       src = SUBREG_REG (src);
       dst = SUBREG_REG (dst);
+      if (GET_MODE (src) != GET_MODE (dst))
+       return 0;
     }

but I'm not sure whether I'm not missing something about subreg
semantics in the first place.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-09  9:50 subreg vs vec_select Ilya Leoshkevich
@ 2020-09-09 21:09 ` Segher Boessenkool
  2020-09-10 10:21   ` Ilya Leoshkevich
  0 siblings, 1 reply; 9+ messages in thread
From: Segher Boessenkool @ 2020-09-09 21:09 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc

Hi Ilya,

On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc wrote:
> I have a vector pseudo containing a single 128-bit value (V1TFmode) and
> I need to access its last 64 bits (DFmode). Which of the two options
> is better?
> 
> (subreg:DF (reg:V1TF) 8)
> 
> or
> 
> (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int 1)]))
> 
> If I use the first one, I run into a problem with set_noop_p (): it
> thinks that
> 
> (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> 
> is a no-op, because it doesn't check the mode after stripping the
> subreg:
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> 
> However this is not correct, because SET_DEST is the second register in
> a register pair, and SET_SRC is half of a vector register that overlaps
> the first register in the corresponding pair. So it looks as if mode
> needs to be considered there.

Yes.

> This helps:
> 
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
>         return 0;
>        src = SUBREG_REG (src);
>        dst = SUBREG_REG (dst);
> +      if (GET_MODE (src) != GET_MODE (dst))
> +       return 0;
>      }
> 
> but I'm not sure whether I'm not missing something about subreg
> semantics in the first place.

You probably should just see if both modes are the same number of hard
registers?  HARD_REGNO_NREGS.


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-09 21:09 ` Segher Boessenkool
@ 2020-09-10 10:21   ` Ilya Leoshkevich
  2020-09-10 22:30     ` Segher Boessenkool
  2020-09-11  9:46     ` Richard Sandiford
  0 siblings, 2 replies; 9+ messages in thread
From: Ilya Leoshkevich @ 2020-09-10 10:21 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc

On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> Hi Ilya,
> 
> On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
> wrote:
> > I have a vector pseudo containing a single 128-bit value (V1TFmode)
> > and
> > I need to access its last 64 bits (DFmode). Which of the two
> > options
> > is better?
> > 
> > (subreg:DF (reg:V1TF) 8)
> > 
> > or
> > 
> > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > 1)]))
> > 
> > If I use the first one, I run into a problem with set_noop_p (): it
> > thinks that
> > 
> > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > 
> > is a no-op, because it doesn't check the mode after stripping the
> > subreg:
> > 
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > 
> > However this is not correct, because SET_DEST is the second
> > register in
> > a register pair, and SET_SRC is half of a vector register that
> > overlaps
> > the first register in the corresponding pair. So it looks as if
> > mode
> > needs to be considered there.
> 
> Yes.
> 
> > This helps:
> > 
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> >         return 0;
> >        src = SUBREG_REG (src);
> >        dst = SUBREG_REG (dst);
> > +      if (GET_MODE (src) != GET_MODE (dst))
> > +       return 0;
> >      }
> > 
> > but I'm not sure whether I'm not missing something about subreg
> > semantics in the first place.
> 
> You probably should just see if both modes are the same number of
> hard
> registers?  HARD_REGNO_NREGS.

I've refined my patch as follows:

--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
        return 0;
       src = SUBREG_REG (src);
       dst = SUBREG_REG (dst);
+      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
+         && HARD_REGISTER_P (dst)
+         && hard_regno_nregs (REGNO (src), GET_MODE (src))
+                != hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
+       return 0;
     }
 
This also helps, and is less restrictive than my first variant.
Two questions, just for my understanding:

1) This mode confusion problem must never happen to pseudos, because,
   unlike hard registers, pseudos must be always referred to in their
   natural mode.  Is this correct?

2) Can there be a hypothetical machine, where modes XF and YF refer to
   64-bit and 128-bit register pairs respectively?  This would cause
   the mode confusion problem again.  Is there anything in RTL
   semantics that would prohibit existence of such modes?

Best regards,
Ilya


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-10 10:21   ` Ilya Leoshkevich
@ 2020-09-10 22:30     ` Segher Boessenkool
  2020-09-11  9:46     ` Richard Sandiford
  1 sibling, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2020-09-10 22:30 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc

Hi!

On Thu, Sep 10, 2020 at 12:21:47PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
> > wrote:
> > > I have a vector pseudo containing a single 128-bit value (V1TFmode)
> > > and
> > > I need to access its last 64 bits (DFmode). Which of the two
> > > options
> > > is better?
> > > 
> > > (subreg:DF (reg:V1TF) 8)
> > > 
> > > or
> > > 
> > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > > 1)]))
> > > 
> > > If I use the first one, I run into a problem with set_noop_p (): it
> > > thinks that
> > > 
> > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > 
> > > is a no-op, because it doesn't check the mode after stripping the
> > > subreg:
> > > 
> > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > 
> > > However this is not correct, because SET_DEST is the second
> > > register in
> > > a register pair, and SET_SRC is half of a vector register that
> > > overlaps
> > > the first register in the corresponding pair. So it looks as if
> > > mode
> > > needs to be considered there.
> > 
> > Yes.
> > 
> > > This helps:
> > > 
> > > --- a/gcc/rtlanal.c
> > > +++ b/gcc/rtlanal.c
> > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > >         return 0;
> > >        src = SUBREG_REG (src);
> > >        dst = SUBREG_REG (dst);
> > > +      if (GET_MODE (src) != GET_MODE (dst))
> > > +       return 0;
> > >      }
> > > 
> > > but I'm not sure whether I'm not missing something about subreg
> > > semantics in the first place.
> > 
> > You probably should just see if both modes are the same number of
> > hard
> > registers?  HARD_REGNO_NREGS.
> 
> I've refined my patch as follows:
> 
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
>         return 0;
>        src = SUBREG_REG (src);
>        dst = SUBREG_REG (dst);
> +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> +         && HARD_REGISTER_P (dst)
> +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> +                != hard_regno_nregs (REGNO (dst), GET_MODE (dst)))

(The "!" should align with the "h".)

> +       return 0;
>      }

Looks good to me, thanks!

> This also helps, and is less restrictive than my first variant.
> Two questions, just for my understanding:
> 
> 1) This mode confusion problem must never happen to pseudos, because,
>    unlike hard registers, pseudos must be always referred to in their
>    natural mode.  Is this correct?

Only hard registers can be accessed in more than one mode, yes.  To
access pseudos in another mode you need a subreg.

> 2) Can there be a hypothetical machine, where modes XF and YF refer to
>    64-bit and 128-bit register pairs respectively?  This would cause
>    the mode confusion problem again.  Is there anything in RTL
>    semantics that would prohibit existence of such modes?

They would have to be the same hard register number as well.  Not very
likely, but I don't see why it couldn't happen theoretically.


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-10 10:21   ` Ilya Leoshkevich
  2020-09-10 22:30     ` Segher Boessenkool
@ 2020-09-11  9:46     ` Richard Sandiford
  2020-09-11 10:17       ` Ilya Leoshkevich
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-11  9:46 UTC (permalink / raw)
  To: Ilya Leoshkevich via Gcc; +Cc: Segher Boessenkool, Ilya Leoshkevich

Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
> On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
>> Hi Ilya,
>> 
>> On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
>> wrote:
>> > I have a vector pseudo containing a single 128-bit value (V1TFmode)
>> > and
>> > I need to access its last 64 bits (DFmode). Which of the two
>> > options
>> > is better?
>> > 
>> > (subreg:DF (reg:V1TF) 8)
>> > 
>> > or
>> > 
>> > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
>> > 1)]))
>> > 
>> > If I use the first one, I run into a problem with set_noop_p (): it
>> > thinks that
>> > 
>> > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
>> > 
>> > is a no-op, because it doesn't check the mode after stripping the
>> > subreg:
>> > 
>> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
>> > 
>> > However this is not correct, because SET_DEST is the second
>> > register in
>> > a register pair, and SET_SRC is half of a vector register that
>> > overlaps
>> > the first register in the corresponding pair. So it looks as if
>> > mode
>> > needs to be considered there.
>> 
>> Yes.
>> 
>> > This helps:
>> > 
>> > --- a/gcc/rtlanal.c
>> > +++ b/gcc/rtlanal.c
>> > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
>> >         return 0;
>> >        src = SUBREG_REG (src);
>> >        dst = SUBREG_REG (dst);
>> > +      if (GET_MODE (src) != GET_MODE (dst))
>> > +       return 0;
>> >      }
>> > 
>> > but I'm not sure whether I'm not missing something about subreg
>> > semantics in the first place.
>> 
>> You probably should just see if both modes are the same number of
>> hard
>> registers?  HARD_REGNO_NREGS.
>
> I've refined my patch as follows:
>
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
>         return 0;
>        src = SUBREG_REG (src);
>        dst = SUBREG_REG (dst);
> +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> +         && HARD_REGISTER_P (dst)
> +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> +                != hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
> +       return 0;
>      }

I think checking the mode would be safer.  Having the same number
of registers doesn't mean that the bits are distributed across the
registers in the same way.

Out of interest, why can't the subregs in the example above get
folded down to hard registers?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-11  9:46     ` Richard Sandiford
@ 2020-09-11 10:17       ` Ilya Leoshkevich
  2020-09-11 10:22         ` Ilya Leoshkevich
  0 siblings, 1 reply; 9+ messages in thread
From: Ilya Leoshkevich @ 2020-09-11 10:17 UTC (permalink / raw)
  To: Richard Sandiford, Ilya Leoshkevich via Gcc; +Cc: Segher Boessenkool

On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
> > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > Hi Ilya,
> > > 
> > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
> > > Gcc
> > > wrote:
> > > > I have a vector pseudo containing a single 128-bit value
> > > > (V1TFmode)
> > > > and
> > > > I need to access its last 64 bits (DFmode). Which of the two
> > > > options
> > > > is better?
> > > > 
> > > > (subreg:DF (reg:V1TF) 8)
> > > > 
> > > > or
> > > > 
> > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > > > 1)]))
> > > > 
> > > > If I use the first one, I run into a problem with set_noop_p
> > > > (): it
> > > > thinks that
> > > > 
> > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > > 
> > > > is a no-op, because it doesn't check the mode after stripping
> > > > the
> > > > subreg:
> > > > 
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > 
> > > > However this is not correct, because SET_DEST is the second
> > > > register in
> > > > a register pair, and SET_SRC is half of a vector register that
> > > > overlaps
> > > > the first register in the corresponding pair. So it looks as if
> > > > mode
> > > > needs to be considered there.
> > > 
> > > Yes.
> > > 
> > > > This helps:
> > > > 
> > > > --- a/gcc/rtlanal.c
> > > > +++ b/gcc/rtlanal.c
> > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > >         return 0;
> > > >        src = SUBREG_REG (src);
> > > >        dst = SUBREG_REG (dst);
> > > > +      if (GET_MODE (src) != GET_MODE (dst))
> > > > +       return 0;
> > > >      }
> > > > 
> > > > but I'm not sure whether I'm not missing something about subreg
> > > > semantics in the first place.
> > > 
> > > You probably should just see if both modes are the same number of
> > > hard
> > > registers?  HARD_REGNO_NREGS.
> > 
> > I've refined my patch as follows:
> > 
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> >         return 0;
> >        src = SUBREG_REG (src);
> >        dst = SUBREG_REG (dst);
> > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> > +         && HARD_REGISTER_P (dst)
> > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > +                != hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
> > +       return 0;
> >      }
> 
> I think checking the mode would be safer.  Having the same number
> of registers doesn't mean that the bits are distributed across the
> registers in the same way.

Yeah, that's what I was trying to express with this hypothetical
machine example.  On the other hand, checking mode is too pessimistic.
E.g. if we talk about s390 GPRs, then considering

(set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))

a no-op is fine from my perspective.  So having a more restrictive
check might be desirable.  Is there a way to ask the backend how the
subreg bits are distributed?

> Out of interest, why can't the subregs in the example above get
> folded down to hard registers?

I think this is because the offsets are not 0.  I could imagine folding
(subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
backend hook for this.  Does anything like this exist?  Also, can
(subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
the second doubleword of 128-bit %v0 vector register.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-11 10:17       ` Ilya Leoshkevich
@ 2020-09-11 10:22         ` Ilya Leoshkevich
  2020-09-11 11:14           ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Ilya Leoshkevich @ 2020-09-11 10:22 UTC (permalink / raw)
  To: Richard Sandiford, Ilya Leoshkevich via Gcc; +Cc: Segher Boessenkool

On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
> On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
> > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > > Hi Ilya,
> > > > 
> > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
> > > > Gcc
> > > > wrote:
> > > > > I have a vector pseudo containing a single 128-bit value
> > > > > (V1TFmode)
> > > > > and
> > > > > I need to access its last 64 bits (DFmode). Which of the two
> > > > > options
> > > > > is better?
> > > > > 
> > > > > (subreg:DF (reg:V1TF) 8)
> > > > > 
> > > > > or
> > > > > 
> > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
> > > > > [(const_int
> > > > > 1)]))
> > > > > 
> > > > > If I use the first one, I run into a problem with set_noop_p
> > > > > (): it
> > > > > thinks that
> > > > > 
> > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > > > 
> > > > > is a no-op, because it doesn't check the mode after stripping
> > > > > the
> > > > > subreg:
> > > > > 
> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > > 
> > > > > However this is not correct, because SET_DEST is the second
> > > > > register in
> > > > > a register pair, and SET_SRC is half of a vector register
> > > > > that
> > > > > overlaps
> > > > > the first register in the corresponding pair. So it looks as
> > > > > if
> > > > > mode
> > > > > needs to be considered there.
> > > > 
> > > > Yes.
> > > > 
> > > > > This helps:
> > > > > 
> > > > > --- a/gcc/rtlanal.c
> > > > > +++ b/gcc/rtlanal.c
> > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > >         return 0;
> > > > >        src = SUBREG_REG (src);
> > > > >        dst = SUBREG_REG (dst);
> > > > > +      if (GET_MODE (src) != GET_MODE (dst))
> > > > > +       return 0;
> > > > >      }
> > > > > 
> > > > > but I'm not sure whether I'm not missing something about
> > > > > subreg
> > > > > semantics in the first place.
> > > > 
> > > > You probably should just see if both modes are the same number
> > > > of
> > > > hard
> > > > registers?  HARD_REGNO_NREGS.
> > > 
> > > I've refined my patch as follows:
> > > 
> > > --- a/gcc/rtlanal.c
> > > +++ b/gcc/rtlanal.c
> > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > >         return 0;
> > >        src = SUBREG_REG (src);
> > >        dst = SUBREG_REG (dst);
> > > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> > > +         && HARD_REGISTER_P (dst)
> > > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > > +                != hard_regno_nregs (REGNO (dst), GET_MODE
> > > (dst)))
> > > +       return 0;
> > >      }
> > 
> > I think checking the mode would be safer.  Having the same number
> > of registers doesn't mean that the bits are distributed across the
> > registers in the same way.
> 
> Yeah, that's what I was trying to express with this hypothetical
> machine example.  On the other hand, checking mode is too
> pessimistic.
> E.g. if we talk about s390 GPRs, then considering
> 
> (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))

Sorry, bad example: here the hard register modes actually match.
But it's probably possible to come up with something similar, where
the hard reg is accessed with different modes, but when we add subregs
on top, then we end up selecting the same bits.

> a no-op is fine from my perspective.  So having a more restrictive
> check might be desirable.  Is there a way to ask the backend how the
> subreg bits are distributed?
> 
> > Out of interest, why can't the subregs in the example above get
> > folded down to hard registers?
> 
> I think this is because the offsets are not 0.  I could imagine
> folding
> (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
> backend hook for this.  Does anything like this exist?  Also, can
> (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
> the second doubleword of 128-bit %v0 vector register.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-11 10:22         ` Ilya Leoshkevich
@ 2020-09-11 11:14           ` Richard Sandiford
  2020-09-11 14:27             ` Ilya Leoshkevich
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-09-11 11:14 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: Ilya Leoshkevich via Gcc, Segher Boessenkool

Ilya Leoshkevich <iii@linux.ibm.com> writes:
> On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
>> On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
>> > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
>> > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
>> > > > Hi Ilya,
>> > > > 
>> > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
>> > > > Gcc
>> > > > wrote:
>> > > > > I have a vector pseudo containing a single 128-bit value
>> > > > > (V1TFmode)
>> > > > > and
>> > > > > I need to access its last 64 bits (DFmode). Which of the two
>> > > > > options
>> > > > > is better?
>> > > > > 
>> > > > > (subreg:DF (reg:V1TF) 8)
>> > > > > 
>> > > > > or
>> > > > > 
>> > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
>> > > > > [(const_int
>> > > > > 1)]))
>> > > > > 
>> > > > > If I use the first one, I run into a problem with set_noop_p
>> > > > > (): it
>> > > > > thinks that
>> > > > > 
>> > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
>> > > > > 
>> > > > > is a no-op, because it doesn't check the mode after stripping
>> > > > > the
>> > > > > subreg:
>> > > > > 
>> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
>> > > > > 
>> > > > > However this is not correct, because SET_DEST is the second
>> > > > > register in
>> > > > > a register pair, and SET_SRC is half of a vector register
>> > > > > that
>> > > > > overlaps
>> > > > > the first register in the corresponding pair. So it looks as
>> > > > > if
>> > > > > mode
>> > > > > needs to be considered there.
>> > > > 
>> > > > Yes.
>> > > > 
>> > > > > This helps:
>> > > > > 
>> > > > > --- a/gcc/rtlanal.c
>> > > > > +++ b/gcc/rtlanal.c
>> > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
>> > > > >         return 0;
>> > > > >        src = SUBREG_REG (src);
>> > > > >        dst = SUBREG_REG (dst);
>> > > > > +      if (GET_MODE (src) != GET_MODE (dst))
>> > > > > +       return 0;
>> > > > >      }
>> > > > > 
>> > > > > but I'm not sure whether I'm not missing something about
>> > > > > subreg
>> > > > > semantics in the first place.
>> > > > 
>> > > > You probably should just see if both modes are the same number
>> > > > of
>> > > > hard
>> > > > registers?  HARD_REGNO_NREGS.
>> > > 
>> > > I've refined my patch as follows:
>> > > 
>> > > --- a/gcc/rtlanal.c
>> > > +++ b/gcc/rtlanal.c
>> > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
>> > >         return 0;
>> > >        src = SUBREG_REG (src);
>> > >        dst = SUBREG_REG (dst);
>> > > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
>> > > +         && HARD_REGISTER_P (dst)
>> > > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
>> > > +                != hard_regno_nregs (REGNO (dst), GET_MODE
>> > > (dst)))
>> > > +       return 0;
>> > >      }
>> > 
>> > I think checking the mode would be safer.  Having the same number
>> > of registers doesn't mean that the bits are distributed across the
>> > registers in the same way.
>> 
>> Yeah, that's what I was trying to express with this hypothetical
>> machine example.  On the other hand, checking mode is too
>> pessimistic.
>> E.g. if we talk about s390 GPRs, then considering
>> 
>> (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))
>
> Sorry, bad example: here the hard register modes actually match.
> But it's probably possible to come up with something similar, where
> the hard reg is accessed with different modes, but when we add subregs
> on top, then we end up selecting the same bits.

Wel, in general, (subreg (reg …) …) shouldn't exist for hard registers.
They should just be folded to the underlying (reg …) instead.
So in the above I'd expect to see:

  (set (reg:SI %r0) (reg:SI %r0))

instead.  I remember there were cases on e.g. e500 where the target
prevented the fold from taking place due to layout issues, but wasn't
sure whether the above FP example above was something similar.

So…

>> a no-op is fine from my perspective.  So having a more restrictive
>> check might be desirable.  Is there a way to ask the backend how the
>> subreg bits are distributed?

…if the bits are distributed evenly in an obvious way, the hard register
subreg should already have been folded to a reg.  I think subregs of
hard registers are rare enough that we should just keep it simple
and safe.

>> > Out of interest, why can't the subregs in the example above get
>> > folded down to hard registers?
>> 
>> I think this is because the offsets are not 0.  I could imagine
>> folding
>> (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
>> backend hook for this.  Does anything like this exist?  Also, can
>> (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
>> the second doubleword of 128-bit %v0 vector register.

For multi-register values with a regular layout, simplify_subreg_regno
(via subreg_get_info) uses various target queries to work out which
hard registers are actually being accessed.  Is the subreg above
for big-endian or little-endian?  (I hope big-endian given the
8 offset.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: subreg vs vec_select
  2020-09-11 11:14           ` Richard Sandiford
@ 2020-09-11 14:27             ` Ilya Leoshkevich
  0 siblings, 0 replies; 9+ messages in thread
From: Ilya Leoshkevich @ 2020-09-11 14:27 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Ilya Leoshkevich via Gcc, Segher Boessenkool

On Fri, 2020-09-11 at 12:14 +0100, Richard Sandiford wrote:
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> > On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
> > > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> > > > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
> > > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > > > > Hi Ilya,
> > > > > > 
> > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich
> > > > > > via
> > > > > > Gcc
> > > > > > wrote:
> > > > > > > I have a vector pseudo containing a single 128-bit value
> > > > > > > (V1TFmode)
> > > > > > > and
> > > > > > > I need to access its last 64 bits (DFmode). Which of the
> > > > > > > two
> > > > > > > options
> > > > > > > is better?
> > > > > > > 
> > > > > > > (subreg:DF (reg:V1TF) 8)
> > > > > > > 
> > > > > > > or
> > > > > > > 
> > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
> > > > > > > [(const_int
> > > > > > > 1)]))
> > > > > > > 
> > > > > > > If I use the first one, I run into a problem with
> > > > > > > set_noop_p
> > > > > > > (): it
> > > > > > > thinks that
> > > > > > > 
> > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0)
> > > > > > > 8))
> > > > > > > 
> > > > > > > is a no-op, because it doesn't check the mode after
> > > > > > > stripping
> > > > > > > the
> > > > > > > subreg:
> > > > > > > 
> > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > > > > 
> > > > > > > However this is not correct, because SET_DEST is the
> > > > > > > second
> > > > > > > register in
> > > > > > > a register pair, and SET_SRC is half of a vector register
> > > > > > > that
> > > > > > > overlaps
> > > > > > > the first register in the corresponding pair. So it looks
> > > > > > > as
> > > > > > > if
> > > > > > > mode
> > > > > > > needs to be considered there.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > This helps:
> > > > > > > 
> > > > > > > --- a/gcc/rtlanal.c
> > > > > > > +++ b/gcc/rtlanal.c
> > > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > > > >         return 0;
> > > > > > >        src = SUBREG_REG (src);
> > > > > > >        dst = SUBREG_REG (dst);
> > > > > > > +      if (GET_MODE (src) != GET_MODE (dst))
> > > > > > > +       return 0;
> > > > > > >      }
> > > > > > > 
> > > > > > > but I'm not sure whether I'm not missing something about
> > > > > > > subreg
> > > > > > > semantics in the first place.
> > > > > > 
> > > > > > You probably should just see if both modes are the same
> > > > > > number
> > > > > > of
> > > > > > hard
> > > > > > registers?  HARD_REGNO_NREGS.
> > > > > 
> > > > > I've refined my patch as follows:
> > > > > 
> > > > > --- a/gcc/rtlanal.c
> > > > > +++ b/gcc/rtlanal.c
> > > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > > > >         return 0;
> > > > >        src = SUBREG_REG (src);
> > > > >        dst = SUBREG_REG (dst);
> > > > > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P
> > > > > (dst)
> > > > > +         && HARD_REGISTER_P (dst)
> > > > > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > > > > +                != hard_regno_nregs (REGNO (dst), GET_MODE
> > > > > (dst)))
> > > > > +       return 0;
> > > > >      }
> > > > 
> > > > I think checking the mode would be safer.  Having the same
> > > > number
> > > > of registers doesn't mean that the bits are distributed across
> > > > the
> > > > registers in the same way.
> > > 
> > > Yeah, that's what I was trying to express with this hypothetical
> > > machine example.  On the other hand, checking mode is too
> > > pessimistic.
> > > E.g. if we talk about s390 GPRs, then considering
> > > 
> > > (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))
> > 
> > Sorry, bad example: here the hard register modes actually match.
> > But it's probably possible to come up with something similar, where
> > the hard reg is accessed with different modes, but when we add
> > subregs
> > on top, then we end up selecting the same bits.
> 
> Wel, in general, (subreg (reg …) …) shouldn't exist for hard
> registers.
> They should just be folded to the underlying (reg …) instead.
> So in the above I'd expect to see:
> 
>   (set (reg:SI %r0) (reg:SI %r0))
> 
> instead.  I remember there were cases on e.g. e500 where the target
> prevented the fold from taking place due to layout issues, but wasn't
> sure whether the above FP example above was something similar.
> 
> So…
> 
> > > a no-op is fine from my perspective.  So having a more
> > > restrictive
> > > check might be desirable.  Is there a way to ask the backend how
> > > the
> > > subreg bits are distributed?
> 
> …if the bits are distributed evenly in an obvious way, the hard
> register
> subreg should already have been folded to a reg.  I think subregs of
> hard registers are rare enough that we should just keep it simple
> and safe.

Ok! I'll make a proper patch out of this and post to gcc-patches then.

> 
> > > > Out of interest, why can't the subregs in the example above get
> > > > folded down to hard registers?
> > > 
> > > I think this is because the offsets are not 0.  I could imagine
> > > folding
> > > (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
> > > backend hook for this.  Does anything like this exist?  Also, can
> > > (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
> > > the second doubleword of 128-bit %v0 vector register.
> 
> For multi-register values with a regular layout,
> simplify_subreg_regno
> (via subreg_get_info) uses various target queries to work out which
> hard registers are actually being accessed.  Is the subreg above
> for big-endian or little-endian?  (I hope big-endian given the
> 8 offset.)

This is big-endian (s390).  Thanks for the info, register-related
target queries seem to be pretty extensive.  I still couldn't see
anything that would answer e.g. "what bits are accessed by
(reg:TF %f0)" with "bits 0-64 of %f0 and bits 0-64 of %f2", but that
level of detail is probably not needed anyway.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-11 14:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09  9:50 subreg vs vec_select Ilya Leoshkevich
2020-09-09 21:09 ` Segher Boessenkool
2020-09-10 10:21   ` Ilya Leoshkevich
2020-09-10 22:30     ` Segher Boessenkool
2020-09-11  9:46     ` Richard Sandiford
2020-09-11 10:17       ` Ilya Leoshkevich
2020-09-11 10:22         ` Ilya Leoshkevich
2020-09-11 11:14           ` Richard Sandiford
2020-09-11 14:27             ` Ilya Leoshkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).