[PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
@ 2023-12-23  8:58 YunQiang Su
  2023-12-23 16:51 ` Jeff Law
  2024-01-03 23:39 ` Richard Sandiford
  0 siblings, 2 replies; 16+ messages in thread
From: YunQiang Su @ 2023-12-23  8:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, pinskia, jeffreyalaw, rguenther, YunQiang Su

On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
if 31 or above bits is polluted by an bitops, we will need an
truncate. Let's emit one, and mark let's use the same hardreg
as in and out, the RTL may like:

(insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
        (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
     (nil))

We use /s/u flags to mark it as really needed, as in
combine_simplify_rtx, this insn may be considered as truncated,
so let's skip this combination.

gcc/ChangeLog:
        PR: 104914.
        * combine.cc (try_combine): Skip combine with truncate if
	dest is subreg and has /u/s flags on platforms
	TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
	* expr.cc (expand_assignment): Emit a truncate insn, if
	31+ bits is polluted for SImode.

gcc/testsuite/ChangeLog:
	PR: 104914.
	* gcc.target/mips/pr104914.c: New testcase.
---
 gcc/combine.cc                           | 23 +++++++++++++++++++++-
 gcc/expr.cc                              | 17 ++++++++++++++++
 gcc/testsuite/gcc.target/mips/pr104914.c | 25 ++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 1cda4dd57f2..04b9c414053 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3294,6 +3294,28 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
       n_occurrences = 0;		/* `subst' counts here */
       subst_low_luid = DF_INSN_LUID (i2);
 
+      /* Don't try to combine a TRUNCATE INSN, if it's DEST is SUBREG and has
+	 FLAG /s/u.  We use these 2 flags to mark this INSN as really needed:
+	 normally, it means that the bits of 31+ of this variable is polluted
+	 by a bitops.  The reason of existing of case (subreg:SI (reg:DI)) is
+	 that, the same hardreg may act as src and dest.  */
+      if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
+	  && INSN_P (i2))
+	{
+	  rtx i2dest_o = SET_DEST (PATTERN (i2));
+	  rtx i2src_o = SET_SRC (PATTERN (i2));
+	  if (GET_CODE (i2dest_o) == SUBREG
+	      && GET_MODE (i2dest_o) == SImode
+	      && GET_MODE (SUBREG_REG (i2dest_o)) == DImode
+	      && SUBREG_PROMOTED_VAR_P (i2dest_o)
+	      && SUBREG_PROMOTED_GET (i2dest_o) == SRP_SIGNED
+	      && GET_CODE (i2src_o) == TRUNCATE
+	      && GET_MODE (i2src_o) == SImode
+	      && rtx_equal_p (SUBREG_REG (i2dest_o), XEXP (i2src_o, 0))
+	      )
+	    return 0;
+	}
+
       /* If I1 feeds into I2 and I1DEST is in I1SRC, we need to make a unique
 	 copy of I2SRC each time we substitute it, in order to avoid creating
 	 self-referential RTL when we will be substituting I1SRC for I1DEST
@@ -5326,7 +5348,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool set_src)
 
    UNIQUE_COPY is true if each substitution must be unique.  We do this
    by copying if `n_occurrences' is nonzero.  */
-
 static rtx
 subst (rtx x, rtx from, rtx to, bool in_dest, bool in_cond, bool unique_copy)
 {
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9fef2bf6585..f7236040a34 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6284,6 +6284,23 @@ expand_assignment (tree to, tree from, bool nontemporal)
 					nontemporal, reversep);
 		  convert_move (SUBREG_REG (to_rtx), to_rtx1,
 				SUBREG_PROMOTED_SIGN (to_rtx));
+
+		  rtx last = get_last_insn ();
+		  if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
+		      && known_ge (bitregion_end, 31)
+		      && SUBREG_PROMOTED_VAR_P (to_rtx)
+		      && SUBREG_PROMOTED_SIGN (to_rtx) == SRP_SIGNED
+		      && GET_MODE (to_rtx) == SImode
+		      && GET_MODE (SUBREG_REG (to_rtx)) == DImode
+		      && GET_CODE (SET_SRC (PATTERN (last))) == SIGN_EXTEND
+		      )
+		    {
+		      insn_code icode = convert_optab_handler
+						(trunc_optab, SImode, DImode);
+		      if (icode != CODE_FOR_nothing)
+			emit_unop_insn (icode, to_rtx,
+					SUBREG_REG (to_rtx), TRUNCATE);
+		    }
 		}
 	    }
 	  else
diff --git a/gcc/testsuite/gcc.target/mips/pr104914.c b/gcc/testsuite/gcc.target/mips/pr104914.c
new file mode 100644
index 00000000000..5dd10e84c17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr104914.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-mabi=64" } */
+
+extern void abort (void);
+extern void exit (int);
+
+NOMIPS16 int test (const unsigned char *buf)
+{
+  int val;
+  ((unsigned char*)&val)[0] = *buf++;
+  ((unsigned char*)&val)[1] = *buf++;
+  ((unsigned char*)&val)[2] = *buf++;
+  ((unsigned char*)&val)[3] = *buf++;
+  if(val > 0)
+    return 1;
+  else
+    return 0;
+}
+
+int main ()
+{
+  if (test("\xff\xff\xff\xff") != 0)
+    abort();
+  exit(0);
+}
-- 
2.39.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-23  8:58 [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode YunQiang Su
@ 2023-12-23 16:51 ` Jeff Law
  2023-12-23 22:46   ` YunQiang Su
  2024-01-03 23:39 ` Richard Sandiford
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Law @ 2023-12-23 16:51 UTC (permalink / raw)
  To: YunQiang Su, gcc-patches; +Cc: richard.sandiford, pinskia, rguenther



On 12/23/23 01:58, YunQiang Su wrote:
> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> if 31 or above bits is polluted by an bitops, we will need an
> truncate. Let's emit one, and mark let's use the same hardreg
> as in and out, the RTL may like:
> 
> (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>          (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>       (nil))
> 
> We use /s/u flags to mark it as really needed, as in
> combine_simplify_rtx, this insn may be considered as truncated,
> so let's skip this combination.
> 
> gcc/ChangeLog:
>          PR: 104914.
>          * combine.cc (try_combine): Skip combine with truncate if
> 	dest is subreg and has /u/s flags on platforms
> 	TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
> 	* expr.cc (expand_assignment): Emit a truncate insn, if
> 	31+ bits is polluted for SImode.
> 
> gcc/testsuite/ChangeLog:
> 	PR: 104914.
> 	* gcc.target/mips/pr104914.c: New testcase.
I would suggest you show the RTL before/after whatever transformation 
has caused problems on your target and explain why you think the 
transformation is incorrect.

Focus on the RTL semantics as well as the target specific semantics 
because both are critically important here.

I strongly suspect you're just papering over a problem elsewhere.


> ---
>   gcc/combine.cc                           | 23 +++++++++++++++++++++-
>   gcc/expr.cc                              | 17 ++++++++++++++++
>   gcc/testsuite/gcc.target/mips/pr104914.c | 25 ++++++++++++++++++++++++
>   3 files changed, 64 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 1cda4dd57f2..04b9c414053 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3294,6 +3294,28 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
>         n_occurrences = 0;		/* `subst' counts here */
>         subst_low_luid = DF_INSN_LUID (i2);
>   
> +      /* Don't try to combine a TRUNCATE INSN, if it's DEST is SUBREG and has
> +	 FLAG /s/u.  We use these 2 flags to mark this INSN as really needed:
> +	 normally, it means that the bits of 31+ of this variable is polluted
> +	 by a bitops.  The reason of existing of case (subreg:SI (reg:DI)) is
> +	 that, the same hardreg may act as src and dest.  */
> +      if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
> +	  && INSN_P (i2))
> +	{
> +	  rtx i2dest_o = SET_DEST (PATTERN (i2));
> +	  rtx i2src_o = SET_SRC (PATTERN (i2));
> +	  if (GET_CODE (i2dest_o) == SUBREG
> +	      && GET_MODE (i2dest_o) == SImode
> +	      && GET_MODE (SUBREG_REG (i2dest_o)) == DImode
> +	      && SUBREG_PROMOTED_VAR_P (i2dest_o)
> +	      && SUBREG_PROMOTED_GET (i2dest_o) == SRP_SIGNED
> +	      && GET_CODE (i2src_o) == TRUNCATE
> +	      && GET_MODE (i2src_o) == SImode
> +	      && rtx_equal_p (SUBREG_REG (i2dest_o), XEXP (i2src_o, 0))
> +	      )
> +	    return 0;
> +	}
So checking SI/DI like this is just wrong.  There's nothing special 
about SI/DI.    Checking for equality between the destination and source 
also seems wrong -- if the state of the sign bit is wrong, it's wrong 
regardless of whether or not the source/destination register is the same.


>
> @@ -5326,7 +5348,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool set_src)
>   
>      UNIQUE_COPY is true if each substitution must be unique.  We do this
>      by copying if `n_occurrences' is nonzero.  */
> -
>   static rtx
>   subst (rtx x, rtx from, rtx to, bool in_dest, bool in_cond, bool unique_copy)
>   {
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 9fef2bf6585..f7236040a34 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -6284,6 +6284,23 @@ expand_assignment (tree to, tree from, bool nontemporal)
>   					nontemporal, reversep);
>   		  convert_move (SUBREG_REG (to_rtx), to_rtx1,
>   				SUBREG_PROMOTED_SIGN (to_rtx));
> +
> +		  rtx last = get_last_insn ();
> +		  if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
> +		      && known_ge (bitregion_end, 31)
> +		      && SUBREG_PROMOTED_VAR_P (to_rtx)
> +		      && SUBREG_PROMOTED_SIGN (to_rtx) == SRP_SIGNED
> +		      && GET_MODE (to_rtx) == SImode
> +		      && GET_MODE (SUBREG_REG (to_rtx)) == DImode
> +		      && GET_CODE (SET_SRC (PATTERN (last))) == SIGN_EXTEND
> +		      )
> +		    {
> +		      insn_code icode = convert_optab_handler
> +						(trunc_optab, SImode, DImode);
> +		      if (icode != CODE_FOR_nothing)
> +			emit_unop_insn (icode, to_rtx,
> +					SUBREG_REG (to_rtx), TRUNCATE);
> +		    }
Similar comments about the modes apply here.

But again, my sense is there's a higher level problem here and that 
these changes are just papering over it.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-23 16:51 ` Jeff Law
@ 2023-12-23 22:46   ` YunQiang Su
  2023-12-24  5:27     ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: YunQiang Su @ 2023-12-23 22:46 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, richard.sandiford, pinskia, rguenther

Jeff Law <jeffreyalaw@gmail.com> 于2023年12月24日周日 00:51写道：
>
>
>
> On 12/23/23 01:58, YunQiang Su wrote:
> > On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> > if 31 or above bits is polluted by an bitops, we will need an
> > truncate. Let's emit one, and mark let's use the same hardreg
> > as in and out, the RTL may like:
> >
> > (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
> >          (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
> >       (nil))
> >
> > We use /s/u flags to mark it as really needed, as in
> > combine_simplify_rtx, this insn may be considered as truncated,
> > so let's skip this combination.
> >
> > gcc/ChangeLog:
> >          PR: 104914.
> >          * combine.cc (try_combine): Skip combine with truncate if
> >       dest is subreg and has /u/s flags on platforms
> >       TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
> >       * expr.cc (expand_assignment): Emit a truncate insn, if
> >       31+ bits is polluted for SImode.
> >
> > gcc/testsuite/ChangeLog:
> >       PR: 104914.
> >       * gcc.target/mips/pr104914.c: New testcase.
> I would suggest you show the RTL before/after whatever transformation
> has caused problems on your target and explain why you think the
> transformation is incorrect.
>

Before this patch, the RTL is like this
     (insn 19 18 20 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
               (const_int 8 [0x8])
               (const_int 24 [0x18]))
           (subreg:DI (reg:QI 205) 0)) "../xx.c":7:29 -1
        (nil))
      (insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
           (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0)))
"../xx.c":7:29 -1
       (nil))
     (jump_insn 23 20 24 2 (set (pc)
           (if_then_else (lt (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
                   (const_int 0 [0]))
                (label_ref 32)
               (pc))) "../xx.c":10:5 -1
       (int_list:REG_BR_PROB 440234148 (nil))
      -> 32)

and then, when combine
      (insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
             (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0)))
"../xx.c":7:29 -1
      (nil))
will be convert to
          (note 20 19 23 2 NOTE_INSN_DELETED)
MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true
based on that the hard register is always sign-extended, but here
the hard register is polluted by zero_extract.

If we just patch combine.cc to make it not eat sign_extend, here,
sign_extend will still disappear in the later passes, due to mips define
sign_extend as "emit_note (NOTE_INSN_DELETED)".

So I tried to insert a new truncate RTX here,
    (insn 21 20 24 2 (set (reg/v:DI 200 [ val ])
             (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
         (nil))
This is the RTL for this C code
     int32_t fun (int64_t arg) {
          int32_t a = (int32_t) arg;
          return a;
     }
But, the `reload` pass will get an ICE. I haven't dig the real problem.
If the new RTX is
    (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
           (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
       (nil))
`reload` pass will happily accept it, and then it is converted to
     # this instruction will be sure the reg is well sign extended.
     `sll $rN, $rN, 0`
hard instruction.

The problem is that simple-rtx (called by combine) will believe that
REG 200 has been truncated to SImode, as the dest has an
subreg:SI.

So, I use /s/u flags to tell combine don't do so.

> Focus on the RTL semantics as well as the target specific semantics
> because both are critically important here.
>
> I strongly suspect you're just papering over a problem elsewhere.
>

Yes. I also guess so.  Any new idea?
In the previous threads, you suggested that we can just insert an
truncate instruction just before the comparison.
It still have some problem:
     1. There may be no comparison just after the zero_extract,
         instead some normal calculation, such as add/sub.
         Then, the calculation will get a malformed register, and
         in the ISA document, it is claimed UNPREDICTABLE.
     2. Insert an RTX before every comparison will cause performance
         regression, since in the most case, it is not needed.
     3. Inserting an RTX before comparison still needs some dirty hack
         like this.

>
> > ---
> >   gcc/combine.cc                           | 23 +++++++++++++++++++++-
> >   gcc/expr.cc                              | 17 ++++++++++++++++
> >   gcc/testsuite/gcc.target/mips/pr104914.c | 25 ++++++++++++++++++++++++
> >   3 files changed, 64 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
> >
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index 1cda4dd57f2..04b9c414053 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -3294,6 +3294,28 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
> >         n_occurrences = 0;            /* `subst' counts here */
> >         subst_low_luid = DF_INSN_LUID (i2);
> >
> > +      /* Don't try to combine a TRUNCATE INSN, if it's DEST is SUBREG and has
> > +      FLAG /s/u.  We use these 2 flags to mark this INSN as really needed:
> > +      normally, it means that the bits of 31+ of this variable is polluted
> > +      by a bitops.  The reason of existing of case (subreg:SI (reg:DI)) is
> > +      that, the same hardreg may act as src and dest.  */
> > +      if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
> > +       && INSN_P (i2))
> > +     {
> > +       rtx i2dest_o = SET_DEST (PATTERN (i2));
> > +       rtx i2src_o = SET_SRC (PATTERN (i2));
> > +       if (GET_CODE (i2dest_o) == SUBREG
> > +           && GET_MODE (i2dest_o) == SImode
> > +           && GET_MODE (SUBREG_REG (i2dest_o)) == DImode
> > +           && SUBREG_PROMOTED_VAR_P (i2dest_o)
> > +           && SUBREG_PROMOTED_GET (i2dest_o) == SRP_SIGNED
> > +           && GET_CODE (i2src_o) == TRUNCATE
> > +           && GET_MODE (i2src_o) == SImode
> > +           && rtx_equal_p (SUBREG_REG (i2dest_o), XEXP (i2src_o, 0))
> > +           )
> > +         return 0;
> > +     }
> So checking SI/DI like this is just wrong.  There's nothing special
> about SI/DI.    Checking for equality between the destination and source

With some test, when the case of 8->32, 8->64, and 16->64, 16->64
don't have this problem, as sign_extend to SI/DI from HI/QI is not
NOOP, such as this RTX
    (insn 11 10 12 2 (set (reg/v:DI 198 [ val ])
           (sign_extend:DI (subreg:HI (reg/v:DI 198 [ val ]) 0))) "yy.c":4:29 -1
        (nil))
will be converted to:
      seh $rN, $rN

> also seems wrong -- if the state of the sign bit is wrong, it's wrong
> regardless of whether or not the source/destination register is the same.
>

I guess so, too.
In this patch, I try my best to be very careful not to break something else,
so I use very strict tests for this case.
The reason, I use this rtx_equal_p, based on that in expr.cc (see bellow),
I use
          emit_unop_insn (icode, to_rtx,
                     SUBREG_REG (to_rtx), TRUNCATE);

And in fact, if source/dest are different, the subreg in dest won't be needed,
In this case, combine won't eat this truncate insn.

I try my best to reduce the influence.

>
> >
> > @@ -5326,7 +5348,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool set_src)
> >
> >      UNIQUE_COPY is true if each substitution must be unique.  We do this
> >      by copying if `n_occurrences' is nonzero.  */
> > -
> >   static rtx
> >   subst (rtx x, rtx from, rtx to, bool in_dest, bool in_cond, bool unique_copy)
> >   {
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 9fef2bf6585..f7236040a34 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -6284,6 +6284,23 @@ expand_assignment (tree to, tree from, bool nontemporal)
> >                                       nontemporal, reversep);
> >                 convert_move (SUBREG_REG (to_rtx), to_rtx1,
> >                               SUBREG_PROMOTED_SIGN (to_rtx));
> > +
> > +               rtx last = get_last_insn ();
> > +               if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)
> > +                   && known_ge (bitregion_end, 31)
> > +                   && SUBREG_PROMOTED_VAR_P (to_rtx)
> > +                   && SUBREG_PROMOTED_SIGN (to_rtx) == SRP_SIGNED
> > +                   && GET_MODE (to_rtx) == SImode
> > +                   && GET_MODE (SUBREG_REG (to_rtx)) == DImode
> > +                   && GET_CODE (SET_SRC (PATTERN (last))) == SIGN_EXTEND
> > +                   )
> > +                 {
> > +                   insn_code icode = convert_optab_handler
> > +                                             (trunc_optab, SImode, DImode);
> > +                   if (icode != CODE_FOR_nothing)
> > +                     emit_unop_insn (icode, to_rtx,
> > +                                     SUBREG_REG (to_rtx), TRUNCATE);
> > +                 }
> Similar comments about the modes apply here.
>
> But again, my sense is there's a higher level problem here and that
> these changes are just papering over it.
>
> Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-23 22:46   ` YunQiang Su
@ 2023-12-24  5:27     ` Jeff Law
  2023-12-24  8:11       ` YunQiang Su
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2023-12-24  5:27 UTC (permalink / raw)
  To: YunQiang Su; +Cc: gcc-patches, richard.sandiford, pinskia, rguenther



On 12/23/23 15:46, YunQiang Su wrote:
> Jeff Law <jeffreyalaw@gmail.com> 于2023年12月24日周日 00:51写道：
>>
>>
>>
>> On 12/23/23 01:58, YunQiang Su wrote:
>>> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
>>> if 31 or above bits is polluted by an bitops, we will need an
>>> truncate. Let's emit one, and mark let's use the same hardreg
>>> as in and out, the RTL may like:
>>>
>>> (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>>>           (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>>>        (nil))
>>>
>>> We use /s/u flags to mark it as really needed, as in
>>> combine_simplify_rtx, this insn may be considered as truncated,
>>> so let's skip this combination.
>>>
>>> gcc/ChangeLog:
>>>           PR: 104914.
>>>           * combine.cc (try_combine): Skip combine with truncate if
>>>        dest is subreg and has /u/s flags on platforms
>>>        TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
>>>        * expr.cc (expand_assignment): Emit a truncate insn, if
>>>        31+ bits is polluted for SImode.
>>>
>>> gcc/testsuite/ChangeLog:
>>>        PR: 104914.
>>>        * gcc.target/mips/pr104914.c: New testcase.
>> I would suggest you show the RTL before/after whatever transformation
>> has caused problems on your target and explain why you think the
>> transformation is incorrect.
>>
> 
> Before this patch, the RTL is like this
>       (insn 19 18 20 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
>                 (const_int 8 [0x8])
>                 (const_int 24 [0x18]))
>             (subreg:DI (reg:QI 205) 0)) "../xx.c":7:29 -1
>          (nil))
>        (insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
>             (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0)))
> "../xx.c":7:29 -1
>         (nil))
>       (jump_insn 23 20 24 2 (set (pc)
>             (if_then_else (lt (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>                     (const_int 0 [0]))
>                  (label_ref 32)
>                 (pc))) "../xx.c":10:5 -1
>         (int_list:REG_BR_PROB 440234148 (nil))
>        -> 32)
> 
> and then, when combine
>        (insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
>               (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0)))
> "../xx.c":7:29 -1
>        (nil))
> will be convert to
>            (note 20 19 23 2 NOTE_INSN_DELETED)
> MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true
> based on that the hard register is always sign-extended, but here
> the hard register is polluted by zero_extract.
> 
> If we just patch combine.cc to make it not eat sign_extend, here,
> sign_extend will still disappear in the later passes, due to mips define
> sign_extend as "emit_note (NOTE_INSN_DELETED)".
> 
> So I tried to insert a new truncate RTX here,
>      (insn 21 20 24 2 (set (reg/v:DI 200 [ val ])
>               (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>           (nil))
> This is the RTL for this C code
>       int32_t fun (int64_t arg) {
>            int32_t a = (int32_t) arg;
>            return a;
>       }
> But, the `reload` pass will get an ICE. I haven't dig the real problem.
> If the new RTX is
>      (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>             (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>         (nil))
> `reload` pass will happily accept it, and then it is converted to
>       # this instruction will be sure the reg is well sign extended.
>       `sll $rN, $rN, 0`
> hard instruction.
> 
> The problem is that simple-rtx (called by combine) will believe that
> REG 200 has been truncated to SImode, as the dest has an
> subreg:SI.
> 
> So, I use /s/u flags to tell combine don't do so.
> 
>> Focus on the RTL semantics as well as the target specific semantics
>> because both are critically important here.
>>
>> I strongly suspect you're just papering over a problem elsewhere.
>>
> 
> Yes. I also guess so.  Any new idea?
Well, I see multiple intertwined issues and I think MIPS has largely 
mucked this up.

At a high level DI -> SI truncation is not a nop on MIPS64.  We must 
explicitly sign extend the value from SI->DI to preserve the invariant 
that SI mode objects are extended to DImode.  If we fail to do that, 
then the SImode conditional branch patterns simply aren't going to work.

What doesn't make sense to me is that for truncation, the output mode is 
going to be smaller than the input mode.  Which makes logical sense and 
is codified in the documentation:

> @deftypefn {Target Hook} bool TARGET_TRULY_NOOP_TRUNCATION (poly_uint64 @var{outprec}, poly_uint64 @var{inprec})
> This hook returns true if it is safe to ``convert'' a value of
> @var{inprec} bits to one of @var{outprec} bits (where @var{outprec} is
> smaller than @var{inprec}) by merely operating on it as if it had only
> @var{outprec} bits.  The default returns true unconditionally, which
> is correct for most machines.  When @code{TARGET_TRULY_NOOP_TRUNCATION}
> returns false, the machine description should provide a @code{trunc}
> optab to specify the RTL that performs the required truncation.


Yet the implementation in the mips backend:

> static bool
> mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
> {
>   return !TARGET_64BIT || inprec <= 32 || outprec > 32;
> }


Can you verify what values are getting in here?  If we're being called 
with inprec as 32 and outprec as 64, we're going to return true which 
makes absolutely no sense at all.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  5:27     ` Jeff Law
@ 2023-12-24  8:11       ` YunQiang Su
  2023-12-28 18:11         ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: YunQiang Su @ 2023-12-24  8:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, richard.sandiford, pinskia, rguenther

> > Yes. I also guess so.  Any new idea?
> Well, I see multiple intertwined issues and I think MIPS has largely
> mucked this up.
>
> At a high level DI -> SI truncation is not a nop on MIPS64.  We must
> explicitly sign extend the value from SI->DI to preserve the invariant
> that SI mode objects are extended to DImode.  If we fail to do that,
> then the SImode conditional branch patterns simply aren't going to work.
>

MIPS64 never claims DI -> SI is nop, instead it claims SI -> DI is nop.
And for MIPS64, it has only one type of branch. it works for both SI and DI.

MIPS64 has 3 groups of instructions:
1. The instructions from MIPS32, 32bit calculations included.
    These instructions expect the source values are properly sign-extended,
               otherwise, the result is UNPREDICTABLE.
    And they make sure that the Destinations are  sign-extended.
    Let's use INS here as an example.
2. The newly added 64bit ops.
    Let's use DINS here as an example.
3. branch instructions
     They works depending on the value of the registers.

> What doesn't make sense to me is that for truncation, the output mode is
> going to be smaller than the input mode.  Which makes logical sense and
> is codified in the documentation:
>
> > @deftypefn {Target Hook} bool TARGET_TRULY_NOOP_TRUNCATION (poly_uint64 @var{outprec}, poly_uint64 @var{inprec})
> > This hook returns true if it is safe to ``convert'' a value of
> > @var{inprec} bits to one of @var{outprec} bits (where @var{outprec} is
> > smaller than @var{inprec}) by merely operating on it as if it had only
> > @var{outprec} bits.  The default returns true unconditionally, which
> > is correct for most machines.  When @code{TARGET_TRULY_NOOP_TRUNCATION}
> > returns false, the machine description should provide a @code{trunc}
> > optab to specify the RTL that performs the required truncation.
>
>
> Yet the implementation in the mips backend:
>
> > static bool
> > mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
> > {
> >   return !TARGET_64BIT || inprec <= 32 || outprec > 32;
> > }
>
>
> Can you verify what values are getting in here?  If we're being called
> with inprec as 32 and outprec as 64, we're going to return true which
> makes absolutely no sense at all.
>

One of a key design rule of MIPS, is to reduce hardware mode switch as possible.

Converting from 32 to 64 does be nop, IF the 32 is properly sign extended.
Since the nature of the 1st group of instructions (the result is
always sign extended)
MIPS's software stack carefully keeps SI values sign-extended.
So MIPS64 can claims SI -> DI is nop.

With this design, supporting 32bit software on 64bit hardware is smoothly,
without any mode switch etc.

The only exception is here: the 64bit bitops pollute a SI mode hard register:
        lbu     $3,3($4)
        dins    $2,$3,24,8
        bltz    $2,.L2
LBU loads a value from memory. It's OK now.
and DINS insert it to $2, and it will  pollute bit 31, since DINS is a
newly added
MIPS64 instruction, it won't copy bit31 to bits[32:63].
It will make $2, now a malformed SI mode hard register (31-63 bit is
not ALL same).
Then BLTZ will believe it is a normal DI/64bit value.

We have two options here to solve this problem:
1. Use INS instead of DINS.
    In the original C code, the variable `val` is a value with SI mode.
    So we should use INS here, as INS will to be sure the result is
well sign-extended.
    That's what I tried with the previous patches:
    [V1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624174.html
    [V2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html

2. Add an truncate or sign_extent instruction just after the instruction, which
    may pollute the 31+bits of a SImode registers.
    That's what I am trying to do with this V3 patch.
    The result asm code will be like this:
        lbu     $3,3($4)
        dins    $2,$3,24,8
        sll       $2,$2,0   # <--- newly added instruction
        bltz    $2,.L2
    Background: SLL (shift left logical) is a special instruction in MIPS:
         1. SLL rX,rY,0 on MIPS64 is 32bit sign-extend operation.
         2. SLL r0,r0,0 is NOP instruction.
         3. SLL r0,r0,1 is SSNOP (Superscalar No Operation)
         4. SLL r0,r0,3 is EHB (Execution Hazard Barrier)




> Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  8:11       ` YunQiang Su
@ 2023-12-28 18:11         ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2023-12-28 18:11 UTC (permalink / raw)
  To: YunQiang Su; +Cc: gcc-patches, richard.sandiford, pinskia, rguenther



On 12/24/23 01:11, YunQiang Su wrote:
>>> Yes. I also guess so.  Any new idea?
>> Well, I see multiple intertwined issues and I think MIPS has largely
>> mucked this up.
>>
>> At a high level DI -> SI truncation is not a nop on MIPS64.  We must
>> explicitly sign extend the value from SI->DI to preserve the invariant
>> that SI mode objects are extended to DImode.  If we fail to do that,
>> then the SImode conditional branch patterns simply aren't going to work.
>>
> 
> MIPS64 never claims DI -> SI is nop, instead it claims SI -> DI is nop.
And that just seems wrong, at least for truncation which implies the 
input precision must be larger than the output precision.

If you adjust the mips implementation of TARGET_TRULY_NOOP_TRUNCATION to 
return false when the input precision is smaller than the output 
precision, does that fix this problem?


> And for MIPS64, it has only one type of branch. it works for both SI and DI.
Agreed, but the SImode variant is really just a DImode comparison that 
relies on the sign extending property of the MIPS architecture.  I'm not 
100% sure that's safe in the presence of bit manipulation instructions 
which do not preserve the sign extending property.  We actually don't 
allow some bit manipulations on RV64 for a similar underlying reason.


> 
> Converting from 32 to 64 does be nop, IF the 32 is properly sign extended.
But that's not a *truncation*, that's an *extension*.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-23  8:58 [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode YunQiang Su
  2023-12-23 16:51 ` Jeff Law
@ 2024-01-03 23:39 ` Richard Sandiford
  2024-01-09 18:49   ` Jeff Law
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2024-01-03 23:39 UTC (permalink / raw)
  To: YunQiang Su; +Cc: gcc-patches, pinskia, jeffreyalaw, rguenther

YunQiang Su <syq@gcc.gnu.org> writes:
> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> if 31 or above bits is polluted by an bitops, we will need an
> truncate. Let's emit one, and mark let's use the same hardreg
> as in and out, the RTL may like:
>
> (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>         (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>      (nil))
>
> We use /s/u flags to mark it as really needed, as in
> combine_simplify_rtx, this insn may be considered as truncated,
> so let's skip this combination.
>
> gcc/ChangeLog:
>         PR: 104914.
>         * combine.cc (try_combine): Skip combine with truncate if
> 	dest is subreg and has /u/s flags on platforms
> 	TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
> 	* expr.cc (expand_assignment): Emit a truncate insn, if
> 	31+ bits is polluted for SImode.
>
> gcc/testsuite/ChangeLog:
> 	PR: 104914.
> 	* gcc.target/mips/pr104914.c: New testcase.

Sorry for not looking at this earlier.  I've got a bit lost in the
various threads, so apologies if this has been discussed already
but I think the fix is:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 4f42c0ff487..9847eba19fe 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6275,9 +6275,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	      else
 		{
 		  rtx to_rtx1
-		    = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
-				      SUBREG_REG (to_rtx),
-				      subreg_promoted_mode (to_rtx));
+		    = convert_to_mode (subreg_unpromoted_mode (to_rtx),
+				       SUBREG_REG (to_rtx), false);
 		  result = store_field (to_rtx1, bitsize, bitpos,
 					bitregion_start, bitregion_end,
 					mode1, from, get_alias_set (to),

(completely untested apart from the test case).  That should still
produce a subreg on most targets, but generates the required trunc
on MIPS.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2024-01-03 23:39 ` Richard Sandiford
@ 2024-01-09 18:49   ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2024-01-09 18:49 UTC (permalink / raw)
  To: YunQiang Su, gcc-patches, pinskia, rguenther, richard.sandiford



On 1/3/24 16:39, Richard Sandiford wrote:
> YunQiang Su <syq@gcc.gnu.org> writes:
>> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
>> if 31 or above bits is polluted by an bitops, we will need an
>> truncate. Let's emit one, and mark let's use the same hardreg
>> as in and out, the RTL may like:
>>
>> (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
>>          (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
>>       (nil))
>>
>> We use /s/u flags to mark it as really needed, as in
>> combine_simplify_rtx, this insn may be considered as truncated,
>> so let's skip this combination.
>>
>> gcc/ChangeLog:
>>          PR: 104914.
>>          * combine.cc (try_combine): Skip combine with truncate if
>> 	dest is subreg and has /u/s flags on platforms
>> 	TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
>> 	* expr.cc (expand_assignment): Emit a truncate insn, if
>> 	31+ bits is polluted for SImode.
>>
>> gcc/testsuite/ChangeLog:
>> 	PR: 104914.
>> 	* gcc.target/mips/pr104914.c: New testcase.
> 
> Sorry for not looking at this earlier.  I've got a bit lost in the
> various threads, so apologies if this has been discussed already
> but I think the fix is:
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 4f42c0ff487..9847eba19fe 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -6275,9 +6275,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
>   	      else
>   		{
>   		  rtx to_rtx1
> -		    = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
> -				      SUBREG_REG (to_rtx),
> -				      subreg_promoted_mode (to_rtx));
> +		    = convert_to_mode (subreg_unpromoted_mode (to_rtx),
> +				       SUBREG_REG (to_rtx), false);
>   		  result = store_field (to_rtx1, bitsize, bitpos,
>   					bitregion_start, bitregion_end,
>   					mode1, from, get_alias_set (to),
> 
> (completely untested apart from the test case).  That should still
> produce a subreg on most targets, but generates the required trunc
> on MIPS.
I think this is a subset of Roger's work that got committed a few days back.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24 12:24       ` Roger Sayle
@ 2023-12-28 18:26         ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2023-12-28 18:26 UTC (permalink / raw)
  To: Roger Sayle, 'YunQiang Su'; +Cc: gcc-patches



On 12/24/23 05:24, Roger Sayle wrote:
> 
>>>> What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't
>>>> actually a truncation!  The output precision is first, the input
>>>> precision is second.  The docs explicitly state the output precision
>>>> should be smaller than the input precision (which makes sense for truncation).
>>>>
>>>> That's where I'd start with trying to untangle this mess.
>>>
>>> Thanks (both) for correcting my misunderstanding.
>>> At the very least might I suggest that we introduce a new
>>> TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS can use for this
>>> purpose?  It'd help reduce confusion, and keep the
>>> documentation/function naming correct.
>>>
>>
>> Yes. It is good for me.
>> T_N_T_M_P is a really confusion naming.
> 
> Ignore my suggestion for a new target hook.  GCC already has one.
> You shouldn't be using TRULY_NOOP_TRUNCATION_MODES_P
> with incorrectly ordered arguments. The correct target hook is
> TARGET_MODE_REP_EXTENDED, which the MIPS backend correctly
> defines via mips_mode_rep_extended.
> 
> It's MIPS definition of (and interpretation of) mips_truly_noop_truncation
> that's suspect.
> 
> My latest theory is that these sign extensions should be:
> (set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI))))
> and not
> (set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI))))
In isolation these are the same.  I think the fact that the MIPS backend 
wipes out the sign extension turning the result into a NOP is what makes 
them different.

Of course that's kind of the point behind the TRULY_NOOP_TRUNCATION 
macro.  That's what allows the MIPS target to wipe out the sign extension.

ISTM this might be worth noting in the docs for TRULY_NOOP_TRUNCATION.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  9:15     ` YunQiang Su
  2023-12-24  9:28       ` Andrew Pinski
@ 2023-12-24 12:24       ` Roger Sayle
  2023-12-28 18:26         ` Jeff Law
  1 sibling, 1 reply; 16+ messages in thread
From: Roger Sayle @ 2023-12-24 12:24 UTC (permalink / raw)
  To: 'YunQiang Su'; +Cc: 'Jeff Law', gcc-patches

> > > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't
> > > actually a truncation!  The output precision is first, the input
> > > precision is second.  The docs explicitly state the output precision
> > > should be smaller than the input precision (which makes sense for truncation).
> > >
> > > That's where I'd start with trying to untangle this mess.
> >
> > Thanks (both) for correcting my misunderstanding.
> > At the very least might I suggest that we introduce a new
> > TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS can use for this
> > purpose?  It'd help reduce confusion, and keep the
> > documentation/function naming correct.
> >
> 
> Yes. It is good for me.
> T_N_T_M_P is a really confusion naming.

Ignore my suggestion for a new target hook.  GCC already has one.
You shouldn't be using TRULY_NOOP_TRUNCATION_MODES_P
with incorrectly ordered arguments. The correct target hook is 
TARGET_MODE_REP_EXTENDED, which the MIPS backend correctly
defines via mips_mode_rep_extended.

It's MIPS definition of (and interpretation of) mips_truly_noop_truncation
that's suspect.

My latest theory is that these sign extensions should be:
(set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI))))
and not
(set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI))))
If the RTL optimizer's ever split this instruction the semantics of
the SUBREG intermediate are incorrect.  Another (less desirable)
approach might be to use an UNSPEC.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  9:15     ` YunQiang Su
@ 2023-12-24  9:28       ` Andrew Pinski
  2023-12-24 12:24       ` Roger Sayle
  1 sibling, 0 replies; 16+ messages in thread
From: Andrew Pinski @ 2023-12-24  9:28 UTC (permalink / raw)
  To: YunQiang Su; +Cc: Roger Sayle, Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]

On Sun, Dec 24, 2023, 01:18 YunQiang Su <syq@gcc.gnu.org> wrote:

> Roger Sayle <roger@nextmovesoftware.com> 于2023年12月24日周日 16:51写道：
> >
> >
> > > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> > > truncation!  The output precision is first, the input precision is
> second.  The docs
> > > explicitly state the output precision should be smaller than the input
> precision
> > > (which makes sense for truncation).
> > >
> > > That's where I'd start with trying to untangle this mess.
> >
> > Thanks (both) for correcting my misunderstanding.
> > At the very least might I suggest that we introduce a new
> > TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
> > can use for this purpose?  It'd help reduce confusion, and keep
> > the documentation/function naming correct.
> >
>
> Yes. It is good for me.
> T_N_T_M_P is a really confusion naming.
>
> > When Richard Sandiford "hookized" truly_noop_truncation in 2017
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
> > he mentions the inprec/outprec confusion [deciding not to add a
> > gcc_assert outprec < inprec here, which might be a good idea].
> >
> > The next question is whether this is just
> > TRULY_NOOP_SIGN_EXTENSION_MODES_P
> > or whether there are any targets that usefully ensure some modes
> > are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...
> >
>
> I guess ARM64 is the one TRULY_NOOP_ZERO_EXTENSION?
>

I am not 100% convinced here that is true. Yes aarch64 has many zero-extend
instruction and ones that ignore the top 32 bits. That is a different
requirement from mips.



> > My vote is that a DINS instruction that updates the most significant
> > bit of an SImode value should then expand or define_insn_and_split
> > with an explicit following sign-extension operation.  To avoid this being
> > eliminated by the RTL optimizers/combine the DINS should return a
> > DImode result, with the following extension truncating it to canonical
>
> Is it this one?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html
>
> > SImode form.  This preserves the required backend invariant (and
> > doesn't require tweaking machine-independent code in combine).
> > SImode DINS instructions that don't/can't affect the MSB, can be a
> > single SImode instruction.
> >
>
> Yes. As most of MIPS microarchitecture, INS may have slight better
> performance than DINS.
>

This is not true. Cavium's octeon had the same performance characteristics
for dins and ins. Though I doubt that microarch matters any more.

Thanks,
Andrew




> While, I am worrying that: will some body do something like
>     INS <SI_REG1>,<DI_REG2>,24,8
> In this case, if <DI_REG2> is not sign-extended, the result will be
> UNPREDICTABLE.
> For this, now, I prefer to use DINS and append a SLL.
>
> I tried to write a C code that can produce this case, but not yet
> success.
>
>
> > Cheers,
> > Roger
> > --
> >
> >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  8:51   ` Roger Sayle
@ 2023-12-24  9:15     ` YunQiang Su
  2023-12-24  9:28       ` Andrew Pinski
  2023-12-24 12:24       ` Roger Sayle
  0 siblings, 2 replies; 16+ messages in thread
From: YunQiang Su @ 2023-12-24  9:15 UTC (permalink / raw)
  To: Roger Sayle; +Cc: Jeff Law, gcc-patches

Roger Sayle <roger@nextmovesoftware.com> 于2023年12月24日周日 16:51写道：
>
>
> > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> > truncation!  The output precision is first, the input precision is second.  The docs
> > explicitly state the output precision should be smaller than the input precision
> > (which makes sense for truncation).
> >
> > That's where I'd start with trying to untangle this mess.
>
> Thanks (both) for correcting my misunderstanding.
> At the very least might I suggest that we introduce a new
> TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
> can use for this purpose?  It'd help reduce confusion, and keep
> the documentation/function naming correct.
>

Yes. It is good for me.
T_N_T_M_P is a really confusion naming.

> When Richard Sandiford "hookized" truly_noop_truncation in 2017
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
> he mentions the inprec/outprec confusion [deciding not to add a
> gcc_assert outprec < inprec here, which might be a good idea].
>
> The next question is whether this is just
> TRULY_NOOP_SIGN_EXTENSION_MODES_P
> or whether there are any targets that usefully ensure some modes
> are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...
>

I guess ARM64 is the one TRULY_NOOP_ZERO_EXTENSION?

> My vote is that a DINS instruction that updates the most significant
> bit of an SImode value should then expand or define_insn_and_split
> with an explicit following sign-extension operation.  To avoid this being
> eliminated by the RTL optimizers/combine the DINS should return a
> DImode result, with the following extension truncating it to canonical

Is it this one?
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html

> SImode form.  This preserves the required backend invariant (and
> doesn't require tweaking machine-independent code in combine).
> SImode DINS instructions that don't/can't affect the MSB, can be a
> single SImode instruction.
>

Yes. As most of MIPS microarchitecture, INS may have slight better
performance than DINS.

While, I am worrying that: will some body do something like
    INS <SI_REG1>,<DI_REG2>,24,8
In this case, if <DI_REG2> is not sign-extended, the result will be
UNPREDICTABLE.
For this, now, I prefer to use DINS and append a SLL.

I tried to write a C code that can produce this case, but not yet
success.


> Cheers,
> Roger
> --
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  5:38 ` Jeff Law
@ 2023-12-24  8:51   ` Roger Sayle
  2023-12-24  9:15     ` YunQiang Su
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Sayle @ 2023-12-24  8:51 UTC (permalink / raw)
  To: 'Jeff Law', gcc-patches, 'YunQiang Su'

> What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> truncation!  The output precision is first, the input precision is second.  The docs
> explicitly state the output precision should be smaller than the input precision
> (which makes sense for truncation).
> 
> That's where I'd start with trying to untangle this mess.

Thanks (both) for correcting my misunderstanding.
At the very least might I suggest that we introduce a new
TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
can use for this purpose?  It'd help reduce confusion, and keep
the documentation/function naming correct.

When Richard Sandiford "hookized" truly_noop_truncation in 2017
https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
he mentions the inprec/outprec confusion [deciding not to add a
gcc_assert outprec < inprec here, which might be a good idea].

The next question is whether this is just
TRULY_NOOP_SIGN_EXTENSION_MODES_P
or whether there are any targets that usefully ensure some modes
are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...

My vote is that a DINS instruction that updates the most significant
bit of an SImode value should then expand or define_insn_and_split
with an explicit following sign-extension operation.  To avoid this being
eliminated by the RTL optimizers/combine the DINS should return a
DImode result, with the following extension truncating it to canonical
SImode form.  This preserves the required backend invariant (and
doesn't require tweaking machine-independent code in combine).
SImode DINS instructions that don't/can't affect the MSB, can be a
single SImode instruction.

Cheers,
Roger
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  0:49 Roger Sayle
  2023-12-24  5:38 ` Jeff Law
@ 2023-12-24  8:29 ` YunQiang Su
  1 sibling, 0 replies; 16+ messages in thread
From: YunQiang Su @ 2023-12-24  8:29 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, Jeff Law

Roger Sayle <roger@nextmovesoftware.com> 于2023年12月24日周日 08:49写道：
>
>
> Hi YunQiang (and Jeff),
>
> > MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true
> > based on that the hard register is always sign-extended, but here
> > the hard register is polluted by zero_extract.
>
> I suspect that the bug here is that the MIPS backend shouldn't be returning
> true for TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode).   It's true
> that the backend stores SImode values in DImode registers by sign extending
> them, but this doesn't mean that any DImode pseudo register can be truncated
> to an SImode pseudo just by SUBREG/register naming.  As you point out, if
> the
> high bits of a DImode value are random, truncation isn't a no-op, and
> requires
> an explicit sign-extension instruction.
>

Yes, you are right. While in most case, software/compiler carefully,
when work with SI variables, only instructions these instruction are used:
   1. the result of this instruction is proper sign-extended,
        normally, the instructions from MIPS32.
   2. or use LW to load the value: LW also will sign-extend the registers.

> There's a PR in Bugzilla around this representational issue on MIPS, but I
> can't find it straight away.
>
> Out of curiosity, how badly affected is the testsuite if mips.cc's
> mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
> is changed to just return !TARGET_64BIT ?
>

It will make some performance regression. Use our new tests as an example
The result will be:
        lbu     $2,0($4)
        SLL                             # newly added
        lbu     $3,1($4)
        dins    $2,$3,8,8
        SLL                             # newly added
        lbu     $3,2($4)
        dins    $2,$3,16,8
        SLL                             # newly added
        lbu     $3,3($4)
        dins    $2,$3,24,8
        sll       $2,$2

As my previous patch did, here, in fact if we replace all DINS with INS,
it will just work.

> I agree with Jeff there's an invariant that isn't correctly being modelled
> by
> the MIPS machine description.  A machine description probably shouldn't
> define an addsi3  pattern if what it actually supports is
> (sign_extend:DI (truncate:SI (plus:DI (reg:DI x) (reg:DI y))))
> Trying to model this as SImode addition plus a SUBREG_PROMOTED flag
> is less than ideal.
>

The addsi3 on MIPS64 is like:
(define_insn "*addsi3_extended"
 [(set (match_operand:DI 0 "register_operand" "=d,d")
   (sign_extend:DI
        (plus:SI (match_operand:SI 1 "register_operand" "d,d")
             (match_operand:SI 2 "arith_operand" "d,Q"))))]
 "TARGET_64BIT && !TARGET_MIPS16"
 "@
   addu\t%0,%1,%2
   addiu\t%0,%1,%2"
 [(set_attr "alu_type" "add")
  (set_attr "mode" "SI")])

It expect the source register is in SImode. And in fact in the ISA
documents: the result of ADD/ADDU will be UNPREDICTABLE
if sources is not well sign-extended.


> Just my thoughts.  I'm curious what other folks think.
>
> Cheers,
> Roger
> --
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
  2023-12-24  0:49 Roger Sayle
@ 2023-12-24  5:38 ` Jeff Law
  2023-12-24  8:51   ` Roger Sayle
  2023-12-24  8:29 ` YunQiang Su
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Law @ 2023-12-24  5:38 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches, 'YunQiang Su'



On 12/23/23 17:49, Roger Sayle wrote:
> 
> Hi YunQiang (and Jeff),
> 
>> MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) ==
>> true based on that the hard register is always sign-extended, but
>> here the hard register is polluted by zero_extract.
> 
> I suspect that the bug here is that the MIPS backend shouldn't be
> returning true for TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode).
> It's true that the backend stores SImode values in DImode registers
> by sign extending them, but this doesn't mean that any DImode pseudo
> register can be truncated to an SImode pseudo just by SUBREG/register
> naming.  As you point out, if the high bits of a DImode value are
> random, truncation isn't a no-op, and requires an explicit
> sign-extension instruction.
What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a 
truncation!  The output precision is first, the input precision is 
second.  The docs explicitly state the output precision should be 
smaller than the input precision (which makes sense for truncation).

That's where I'd start with trying to untangle this mess.



> 
> I agree with Jeff there's an invariant that isn't correctly being
> modelled by the MIPS machine description.  A machine description
> probably shouldn't define an addsi3  pattern if what it actually
> supports is (sign_extend:DI (truncate:SI (plus:DI (reg:DI x) (reg:DI
> y)))) Trying to model this as SImode addition plus a SUBREG_PROMOTED
> flag is less than ideal.
It's less than ideal, but we ended up taking a similar approach in the 
RV world.  We actually have a subset of 32bit instructions in rv64, 
including a 32bit add.

The semantics are that it's a (sign_extend:DI (plus:SI (op1) (op2)))

Modeling it that way was actually critical in eliminating redundant sign 
extensions.

But regardless, it looks like there's something weird going on in the 
MIPS port.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
@ 2023-12-24  0:49 Roger Sayle
  2023-12-24  5:38 ` Jeff Law
  2023-12-24  8:29 ` YunQiang Su
  0 siblings, 2 replies; 16+ messages in thread
From: Roger Sayle @ 2023-12-24  0:49 UTC (permalink / raw)
  To: gcc-patches, 'YunQiang Su'; +Cc: 'Jeff Law'

Hi YunQiang (and Jeff),

> MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true
> based on that the hard register is always sign-extended, but here
> the hard register is polluted by zero_extract.

I suspect that the bug here is that the MIPS backend shouldn't be returning
true for TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode).   It's true
that the backend stores SImode values in DImode registers by sign extending
them, but this doesn't mean that any DImode pseudo register can be truncated
to an SImode pseudo just by SUBREG/register naming.  As you point out, if
the
high bits of a DImode value are random, truncation isn't a no-op, and
requires
an explicit sign-extension instruction.

There's a PR in Bugzilla around this representational issue on MIPS, but I
can't find it straight away.

Out of curiosity, how badly affected is the testsuite if mips.cc's
mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
is changed to just return !TARGET_64BIT ?

I agree with Jeff there's an invariant that isn't correctly being modelled
by
the MIPS machine description.  A machine description probably shouldn't
define an addsi3  pattern if what it actually supports is
(sign_extend:DI (truncate:SI (plus:DI (reg:DI x) (reg:DI y))))
Trying to model this as SImode addition plus a SUBREG_PROMOTED flag
is less than ideal.

Just my thoughts.  I'm curious what other folks think.

Cheers,
Roger
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-09 18:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-23  8:58 [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode YunQiang Su
2023-12-23 16:51 ` Jeff Law
2023-12-23 22:46   ` YunQiang Su
2023-12-24  5:27     ` Jeff Law
2023-12-24  8:11       ` YunQiang Su
2023-12-28 18:11         ` Jeff Law
2024-01-03 23:39 ` Richard Sandiford
2024-01-09 18:49   ` Jeff Law
2023-12-24  0:49 Roger Sayle
2023-12-24  5:38 ` Jeff Law
2023-12-24  8:51   ` Roger Sayle
2023-12-24  9:15     ` YunQiang Su
2023-12-24  9:28       ` Andrew Pinski
2023-12-24 12:24       ` Roger Sayle
2023-12-28 18:26         ` Jeff Law
2023-12-24  8:29 ` YunQiang Su

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).