From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Wx9h=DF=gmail.com=wzssyqa@sourceware.org>
Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635])
	by sourceware.org (Postfix) with ESMTPS id 55082385771D
	for <gcc-patches@gcc.gnu.org>; Wed, 19 Jul 2023 08:50:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 55082385771D
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1b9c5e07c1bso52455245ad.2
        for <gcc-patches@gcc.gnu.org>; Wed, 19 Jul 2023 01:50:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1689756618; x=1690361418;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3iqEtsAycTY08Ci2IGoFB2iGeUCcq/eozCSEtuD+56Y=;
        b=IyPUnXoJFJnmXWKaYtlxfki9s85n6qQohIl4U2hw8kB36GeqUtuBtlEKZ3ZSBKWfJD
         r8PUEsvnU6ndng64fiOjlDUB2tE3zvH8qG95QQ4qeOV68i6/+t+nvMbZi0O4LaHIJVAI
         Zq+vPy/3BhDK1u5M1YOAN0IN9EXW8hiZmzmiqYKGjgLCOEhQSYvrMxLarVsh+q8OcB7+
         eJkatTKEsBOy5ilaLRBJ3XsypmPspL/rzRdJh79rL4q+ks4Ic6qE+EWuiUF9nuZBYqXA
         ullpffUdEs02xUrj7VQnlbKXeP0SuTJDYfHnVyVKFFm8n57g4GRyyHrcSYHnK4xO+lwY
         ZshA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689756618; x=1690361418;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=3iqEtsAycTY08Ci2IGoFB2iGeUCcq/eozCSEtuD+56Y=;
        b=brLCjsI6gxYdbHyogN5WeF6gvs86Faf5+SVM8fnJqHMwPjT7IMBbD1DJY3OCk2pk8e
         kJh5nCQGUkuOVzEPvsMNvf9Y45fLrD4uGOE7o+yiad6gHYN83DWzawJcQLNZg0YIwtP6
         k+1EatMOBBQvWgoEXkHJDeKDZB6pIr9MHaYfMWxfkBNHaqtTb3aNQYWkhY6mKTToHqaM
         UGrS7vYAOaQDJdhPvVyu1jgEgqPzKWvDbSu8SkbR9obuHePkFW7E34l766/FBvaF/3zN
         ygJ1BP5DxZ5A+FO1jnBN5xc7STjI1SPY0TgCf8iA/a0vB6i/MCd0WJKjswEUTvH0jwll
         x8qQ==
X-Gm-Message-State: ABy/qLZjfag6oJxwlTbq/y+T8Phw79tl6c9N9dy+xYdcbDSowV5Mpkau
	kzbkWq5GWBKttZx0YvcOzI+2CpKrT44Qea19Xs4=
X-Google-Smtp-Source: APBJJlH1opHNLWe0xEdkaZXpyCSHX6mawnrGsTBYi9v4hUWKTmeM7+/hwZbHn6HnHPFOTq3d5kwiNODskhvm1fuz23Q=
X-Received: by 2002:a17:902:b58c:b0:1b8:a4e5:1735 with SMTP id
 a12-20020a170902b58c00b001b8a4e51735mr16480524pls.61.1689756617622; Wed, 19
 Jul 2023 01:50:17 -0700 (PDT)
MIME-Version: 1.0
References: <20230719041639.2967597-1-yunqiang.su@cipunited.com>
 <nycvar.YFH.7.77.849.2307190625340.4723@jbgna.fhfr.qr> <CAKcpw6V_DFvoKLfRKa7uNRyzOe6D19AA19eXAezz9tvANm3uXw@mail.gmail.com>
 <nycvar.YFH.7.77.849.2307190719080.4723@jbgna.fhfr.qr> <CAKcpw6Xe4R0pKgD+GD5eqjcLCK+gUnh0Gu6bw84nr0hNVf2svQ@mail.gmail.com>
 <CAKcpw6XOVRNcBYAWMLd6_ttUQe5M5KOJd79LDO5zj3=CORrfwA@mail.gmail.com>
In-Reply-To: <CAKcpw6XOVRNcBYAWMLd6_ttUQe5M5KOJd79LDO5zj3=CORrfwA@mail.gmail.com>
From: YunQiang Su <wzssyqa@gmail.com>
Date: Wed, 19 Jul 2023 16:50:06 +0800
Message-ID: <CAKcpw6VinaZHs4cF478Bb+T7QteHd_RryX2bAsB2Jb4ZQD52AA@mail.gmail.com>
Subject: Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible
To: Richard Biener <rguenther@suse.de>
Cc: YunQiang Su <yunqiang.su@cipunited.com>, gcc-patches@gcc.gnu.org, pinskia@gmail.com, 
	jeffreyalaw@gmail.com, ian@airs.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

I am not sure this patch is best, while I think that I am sure the
initial RTL is not correct,
the initial RTL of ARM64 is like

(insn 8 7 9 2 (set (zero_extract:SI (reg/v:SI 98 [ val ])
                                                  ^^
            (const_int 8 [0x8])
            (const_int 0 [0]))
        (reg:SI 102)) "xx.c":3:29 -1
     (nil))


YunQiang Su <wzssyqa@gmail.com> =E4=BA=8E2023=E5=B9=B47=E6=9C=8819=E6=97=A5=
=E5=91=A8=E4=B8=89 16:25=E5=86=99=E9=81=93=EF=BC=9A
>
> YunQiang Su <wzssyqa@gmail.com> =E4=BA=8E2023=E5=B9=B47=E6=9C=8819=E6=97=
=A5=E5=91=A8=E4=B8=89 16:21=E5=86=99=E9=81=93=EF=BC=9A
> >
> > Richard Biener <rguenther@suse.de> =E4=BA=8E2023=E5=B9=B47=E6=9C=8819=
=E6=97=A5=E5=91=A8=E4=B8=89 15:22=E5=86=99=E9=81=93=EF=BC=9A
> > >
> > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > >
> > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> ?2023?7?19=
??? 14:27???
> > > > >
> > > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > > >
> > > > > > PR #104914
> > > > > >
> > > > > > When work with
> > > > > >   int val;
> > > > > >   ((unsigned char*)&val)[3] =3D *buf;
> > > > > >   if (val > 0) ...
> > > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > > D<INS> is used instead of <INS>.  Thus something wrong happens
> > > > > > on sign-extend default architectures, like MIPS64.
> > > > > >
> > > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > > store_integral_bit_field if:
> > > > > >   modes of op0 and str_rtx are INT;
> > > > > >   length of op0 is greater than str_rtx.
> > > > > >
> > > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-g=
nu,
> > > > > > mips64el-linux-gnuabi64 without regression.
> > > > >
> > > > > I still think you are "fixing" this in the wrong place.  The bugz=
illa
> > > > > audit trail points to combine and later notes an eventual expansi=
on
> > > > > issue (but for another testcase/target).
> > > > >
> > > > > You have to explain in more detail on what is wrong with the init=
ial
> > > > > RTL on mips.
> > > > >
> > > >
> > > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX i=
s like
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > >             (const_int 8 [0x8])
> > > >             (const_int 0 [0]))
> > > >         (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > > >      (nil))
> > > >
> > > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DI=
NS`
> > > > instructions.
> > > > While in fact here, we expect an SImode operation, due to `val` in =
C
> > > > code is `int`.
> > > >
> > > > With my patch, the RTX will be like:
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ va=
l ]) 0)
> > > >             (const_int 8 [0x8])
> > > >             (const_int 0 [0]))
> > > >         (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > > >      (nil))
> > >
> > > But if this RTL is correct then the above with DImode is correct as
> > > well and the issue is in the backend definition of the instruction
> > > defining 'DINS'?
> > >
> >
> > I don't think so.
> >
> > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> >                                                      ^^
> >              (const_int 8 [0x8])
> >              (const_int 0 [0]))
> >          (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> >       (nil))
> >
> > This RTL has only info about DI. It doesn't has any info about the
> > real length of
> > `val`. For backend, it has no other choice instead of `DINS`.
> >
> > > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > > >
> > > > The problem is based on 2 fact/root cause:
> > > > 1. MIPS's `INS` instruction will be always to sign-extension, while=
 `DINS` won't
> > > >     li $7, 0xff
> > > >     li $8, 0
> > > >     ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> > >
> > > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > > bits outside of the indicated range.
> > >
> >
> > In fact, it is how sign-extension arch work.
> > No matter wrong or right, the ISA was/is defined like this.
> >
> > In fact, one MIPS 32 ABI, the same C code will generate the RTL like th=
is,
> > and the 32bit object can still workable on 64bit CPU.
> > That's a smart (or brain-damaged) design.
> >
> > > @findex zero_extract
> > > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > > bit-field.  The same sequence of bits are extracted, but they
> > > are filled to an entire word with zeros instead of by sign-extension.
> > >
> >
> > That's depending on the definition of `word` here.
> > For `(zero_extract:SI`, I think that the word is limit to the low 32bit=
 of
> > hardware register.
> > Anyway, it won't break ISA without sign-extension by default.
> >
> > Due to the nature of sign-extension ISA, if we don't sign-extension the
> > `int` variable, it will make something wrong.
> >
> > To make it clear: the word `sign extension` here means:
> >        the the value of 31bit will be copied to bits [32-63], and
> >        the value of bits[0-30] won't be copied.
> > Here is the examples:
> >     li $7, 0xff
> >     li $8, 0x00 00 ff 00
> >     ins $8,$7,16,8
> >                     ^^
> > The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
> >
> >     li $7, 0xff
> >     li $8, 0x00 00 ff 00
> >     ins $8,$7,24,8
> >                     ^^
> > The value of $8 will be: 0x ff ff ff ff ff 00 ff 00
> >
> > > Unlike @code{sign_extract}, this type of expressions can be lvalues
> > > in RTL; they may appear on the left side of an assignment, indicating
> > > insertion of a value into the specified bit-field.
> > > @end table
> > >
> > >
> > > >     li $7, 0xff
> > > >     li $8, 0
> > > >     dins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > > The value of $8 will be 0x 00 00 00 00 ff 00 00 00.
> > >
> > > which isn't correct either.
> > >
> >
> > It is not correct or not-correct: The ISA manual just state like this,
> > and the hardwares are working like this.
> >
> > > If you look a few dumps further you'll see which instruction was
> > > recognized, I suspect the machine description is simply wrong here?
> > >
> >
> > The design of initial RTL may has expect that the backend may expand
> >
> > (insn 14 13 15 2 (set (reg/v:DI 201 [ val ])
> >         (sign_extend:DI (subreg:SI (reg/v:DI 201 [ val ]) 0))) "xx.c":5=
:29 -1
> >      (nil))
> >
> > to an `SLL` instruction, which can fix what `DINS` do, aka
> >      0x 00 00 00 00 ff 00 00 00 ---> 0x ff ff ff ff ff 00 00 00
> >
> > I guess this is what you mean about the mistake of machine description.
> > While MIPS md believes that it's sign-extension by default, so it is
> > not needed at all.
> >
> > > > 2. Due to most of MIPS instructions work with 32bit value, aka inst=
ructions
> > > > without `d` as its first char (in fact with few exception), are sig=
n-extension,
> > > > the MIPS backend just ignore `extendsidi2`, aka RTX
> > > >
> > > > (insn 14 13 15 2 (set (reg/v:DI 200 [ val ])
> > > >         (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) "xx.=
c":5:29 -1
> > > >      (nil))
> > > >
> > > >
> >
> > This is just background info about MIPS:
> >
> > On a MIPS32 hardware, the value -1 is  0x ff ff ff ff, which is same
> > with other arch.
> > On a MIPS64 hardware, the value of (int32_t)-1 is
> >      0x ff ff ff ff ff ff ff ff
> > which is same with (int64_t)-1.
> > So the single compare-and-branch instruction can work with both
> > int32_t and int64_t.
> >
> > On none sign-extension arch, like ARM64, (int32_t)-1 is
> >    0x 00 00 00 00 ff ff ff ff
> > and (int64_t)-1 is
> >    0x ff ff ff ff ff ff ff ff
> > That's why the `CMP` instructions for X and W have different encoding:
> > the 31bit of the encoding: `sf` bit.
> >
> >
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > For this problem, we have 2 choice to fix:
> > 1. This patch
> > 2.
> >
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> For this problem, we have 2 choice to fix:
> 1. This patch
> 2. make MIPS backend expand the bellow RTL to `SLL` always
> (insn 14 13 15 2 (set (reg/v:DI 201 [ val ])
>         (sign_extend:DI (subreg:SI (reg/v:DI 201 [ val ]) 0)))
> "xx.c":5:29 238 {extendsidi2}
>      (nil))
>
>
> >
> > > >
> > > > > Richard.
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >         PR: 104914.
> > > > > >         * expmed.cc(store_bit_field_1): Pass str_rtx and its mo=
de
> > > > > >       to store_integral_bit_field if the length of op0 is great=
er
> > > > > >       than str_rtx.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >         PR: 104914.
> > > > > >       * gcc.target/mips/pr104914.c: New testcase.
> > > > > > ---
> > > > > >  gcc/expmed.cc                            | 20 ++++++++++++++++=
+---
> > > > > >  gcc/testsuite/gcc.target/mips/pr104914.c | 17 ++++++++++++++++=
+
> > > > > >  2 files changed, 34 insertions(+), 3 deletions(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
> > > > > >
> > > > > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > > > > > index fbd4ce2d42f..5531c19e891 100644
> > > > > > --- a/gcc/expmed.cc
> > > > > > +++ b/gcc/expmed.cc
> > > > > > @@ -850,6 +850,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64=
 bitsize, poly_uint64 bitnum,
> > > > > >       since that case is valid for any mode.  The following cas=
es are only
> > > > > >       valid for integral modes.  */
> > > > > >    opt_scalar_int_mode op0_mode =3D int_mode_for_mode (GET_MODE=
 (op0));
> > > > > > +  opt_scalar_int_mode str_mode =3D int_mode_for_mode (GET_MODE=
 (str_rtx));
> > > > > >    scalar_int_mode imode;
> > > > > >    if (!op0_mode.exists (&imode) || imode !=3D GET_MODE (op0))
> > > > > >      {
> > > > > > @@ -881,9 +882,22 @@ store_bit_field_1 (rtx str_rtx, poly_uint6=
4 bitsize, poly_uint64 bitnum,
> > > > > >       op0 =3D gen_lowpart (op0_mode.require (), op0);
> > > > > >      }
> > > > > >
> > > > > > -  return store_integral_bit_field (op0, op0_mode, ibitsize, ib=
itnum,
> > > > > > -                                bitregion_start, bitregion_end=
,
> > > > > > -                                fieldmode, value, reverse, fal=
lback_p);
> > > > > > +  /* If MODEs of str_rtx and op0 are INT, and the length of op=
0 is greater than
> > > > > > +     str_rtx, it means that str_rtx has a shorter SUBREG: int3=
2 on 64 mach/ABI
> > > > > > +     is an example.  For this case, we should use the mode of =
SUBREG, otherwise
> > > > > > +     bad code will generate for sign-extension ports, like MIP=
S.  */
> > > > > > +  bool use_str_mode =3D false;
> > > > > > +  if (GET_MODE_CLASS (GET_MODE (str_rtx)) =3D=3D MODE_INT
> > > > > > +      && GET_MODE_CLASS (GET_MODE (op0)) =3D=3D MODE_INT
> > > > > > +      && known_gt (GET_MODE_SIZE (GET_MODE (op0)),
> > > > > > +                GET_MODE_SIZE (GET_MODE (str_rtx))))
> > > > > > +    use_str_mode =3D true;
> > > > > > +
> > > > > > +  return store_integral_bit_field (use_str_mode ? str_rtx : op=
0,
> > > > > > +                                use_str_mode ? str_mode : op0_=
mode,
> > > > > > +                                ibitsize, ibitnum, bitregion_s=
tart,
> > > > > > +                                bitregion_end, fieldmode, valu=
e,
> > > > > > +                                reverse, fallback_p);
> > > > > >  }
> > > > > >
> > > > > >  /* Subroutine of store_bit_field_1, with the same arguments, e=
xcept
> > > > > > diff --git a/gcc/testsuite/gcc.target/mips/pr104914.c b/gcc/tes=
tsuite/gcc.target/mips/pr104914.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..fd6ef6af446
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/mips/pr104914.c
> > > > > > @@ -0,0 +1,17 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-march=3Dmips64r2 -mabi=3D64" } */
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "\tdins\t" } } */
> > > > > > +
> > > > > > +NOMIPS16 int test (const unsigned char *buf)
> > > > > > +{
> > > > > > +  int val;
> > > > > > +  ((unsigned char*)&val)[0] =3D *buf++;
> > > > > > +  ((unsigned char*)&val)[1] =3D *buf++;
> > > > > > +  ((unsigned char*)&val)[2] =3D *buf++;
> > > > > > +  ((unsigned char*)&val)[3] =3D *buf++;
> > > > > > +  if(val > 0)
> > > > > > +    return 1;
> > > > > > +  else
> > > > > > +    return 0;
> > > > > > +}
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 N=
uernberg,
> > > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Mo=
erman;
> > > > > HRB 36809 (AG Nuernberg)
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuern=
berg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerma=
n;
> > > HRB 36809 (AG Nuernberg)
> >
> >
> >
> > --
> > YunQiang Su
>
>
>
> --
> YunQiang Su


--=20
YunQiang Su