From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dje.gcc@gmail.com>
Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com
 [IPv6:2607:f8b0:4864:20::92c])
 by sourceware.org (Postfix) with ESMTPS id 3CBE53858C60;
 Tue, 12 Oct 2021 22:51:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CBE53858C60
Received: by mail-ua1-x92c.google.com with SMTP id e7so1513199ual.11;
 Tue, 12 Oct 2021 15:51:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=n8QGW7jkLT4Vo7aWrW1BWtgN4U1rw9CamKJfQ6WxvxI=;
 b=7xVj7BXNgzCL9ytctw8yzUUZNUv2nNm9TaB2ikxnXWMY2wbQf6Ka6RCzwck9znMiZc
 7YwysOFXZ0ls6/0HcUhnDJjHGSuZ/ydNQ+S7wiqMJ9PyLyXDKrztU7bsqcOyZUyk50bH
 BW7/QYx81brCr+N9xdKgHMHkoX9qB6V13cheUe8dW270UHAAiyGXTMifqOgq0C92WClF
 OXA45eCoGbjrs+frvbOiZLUMSijWKo2p6aO6sbD+zB7PAPNOdBrunqOm7ac0p5kxYj2F
 a/YYZiwznTK8H1SopfTKJ+LHUtJxelHCLuUI9dmkIaekA4nqJUsRGTwfqLO+jas+R0c6
 p/OQ==
X-Gm-Message-State: AOAM530A7BJrBZPia1TSiAAcs606uCDGmfk96yjZGGROk2mGNjr88lCf
 Byy6LeytHL3raOUJC/PxwmM+a8JdA/GOQQeyeSl+g7ln
X-Google-Smtp-Source: ABdhPJxuCvIZDTfIiA82TYTDJdoXBTD5E6Pg95X9ZrOlhVOdHwXDj1IMU+Ef1/Qpf+mqwQK6ZTnIqIEhqiS9AL669M4=
X-Received: by 2002:a67:e28a:: with SMTP id g10mr6814458vsf.5.1634079094575;
 Tue, 12 Oct 2021 15:51:34 -0700 (PDT)
MIME-Version: 1.0
References: <20210524090213.2813103-1-luoxhu@linux.ibm.com>
 <20210608232543.GC18427@gate.crashing.org>
 <7daea8f2-c0f4-f2e0-eca1-6cfc7496600d@linux.ibm.com>
 <fe8441fa-3ba9-e427-3377-bc7d8ff44a12@linux.ibm.com>
 <7bce16f3-ab02-b26f-57d1-c9da93b9f7b8@linux.ibm.com>
In-Reply-To: <7bce16f3-ab02-b26f-57d1-c9da93b9f7b8@linux.ibm.com>
From: David Edelsohn <dje.gcc@gmail.com>
Date: Tue, 12 Oct 2021 18:51:23 -0400
Message-ID: <CAGWvnyk_7-3J-PKPYRGi=3P-sC+c+hLTiFhcD2YDCXjiaOyP5A@mail.gmail.com>
Subject: Re: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]
To: Xionghu Luo <luoxhu@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
 GCC Patches <gcc-patches@gcc.gnu.org>, 
 Bill Schmidt <wschmidt@linux.ibm.com>, linkw@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Oct 2021 22:51:37 -0000

Hi, Xionghu

What's the status of the \M and \m testcase beautification requested
by Segher?  Did you send an updated patch? Your messages ping the
version prior to Segher's additional comments.

It seems that the changes to the patterns are complete, but there are
remaining questions about the testcase style and if the instruction
counts are ideal. I trust that the instruction counts match the
behavior after the patch, but it seemed that Segher wanted to confirm
that the counts are the values desired / expected from optimal code
generation.  The counts are the total for the file, which doesn't
communicate if the sequences themselves are optimal.

Thanks, David

On Sun, Sep 5, 2021 at 8:54 PM Xionghu Luo <luoxhu@linux.ibm.com> wrote:
>
> Ping^2, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
>
>
> On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote:
> > Gentle ping, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
> >
> >
> > On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
> >> Hi,
> >>
> >> On 2021/6/9 07:25, Segher Boessenkool wrote:
> >>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
> >>>> vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
> >>>> 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for
> >>>> vmrghlb.
> >>>
> >>> (vmrglb)
> >>>
> >>>> +  if (BYTES_BIG_ENDIAN)
> >>>> +    emit_insn (
> >>>> +      gen_altivec_vmrghb_direct (operands[0], operands[1],
> >>>> operands[2]));
> >>>> +  else
> >>>> +    emit_insn (
> >>>> +      gen_altivec_vmrglb_direct (operands[0], operands[2],
> >>>> operands[1]));
> >>>
> >>> Please don't indent like that, it doesn't match what we do elsewhere.
> >>> For better or for worse (for worse imo), we use deep hanging indents.
> >>> If you have to, you can do something like
> >>>
> >>>    rtx insn;
> >>>    if (BYTES_BIG_ENDIAN)
> >>>      insn = gen_altivec_vmrghb_direct (operands[0], operands[1],
> >>> operands[2]);
> >>>    else
> >>>      insn = gen_altivec_vmrglb_direct (operands[0], operands[2],
> >>> operands[1]);
> >>>    emit_insn (insn);
> >>>
> >>> (this is better even, in that it has only one emit_insn), or even
> >>>
> >>>    rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> >>>                    : gen_altivec_vmrglb_direct;
> >>>    if (!BYTES_BIG_ENDIAN)
> >>>      std::swap (operands[1], operands[2]);
> >>>    emit_insn (fun (operands[0], operands[1], operands[2]));
> >>>
> >>> Well, C++ does not allow that last example like that, sigh, so
> >>>    rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ?
> >>> gen_altivec_vmrghb_direct
> >>>                         : gen_altivec_vmrglb_direct;
> >>>
> >>> This is shorter than the other two options ;-)
> >>
> >> Changed.
> >>
> >>>
> >>>> +(define_insn "altivec_vmrghb_direct"
> >>>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
> >>>> +    (vec_select:V16QI
> >>>
> >>> This should be indented one space more.
> >>>
> >>>>     "TARGET_ALTIVEC"
> >>>>     "@
> >>>> -   xxmrghw %x0,%x1,%x2
> >>>> -   vmrghw %0,%1,%2"
> >>>> +  xxmrghw %x0,%x1,%x2
> >>>> +  vmrghw %0,%1,%2"
> >>>
> >>> The original indent was correct, please restore.
> >>>
> >>>> -      emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
> >>>> +      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve,
> >>>> vo));
> >>>
> >>> When you see a mode as part of a pattern name, chances are that it will
> >>> be a good candidate for using parameterized names with.  (But don't do
> >>> that now, just keep it in mind as a nice cleanup to do).
> >>
> >> OK.
> >>
> >>>
> >>>> @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target,
> >>>> rtx op0, rtx op1,
> >>>>          : CODE_FOR_altivec_vmrglh_direct),
> >>>>         {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7,
> >>>> 22, 23 } },
> >>>>       { OPTION_MASK_ALTIVEC,
> >>>> -      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
> >>>> -       : CODE_FOR_altivec_vmrglw_direct),
> >>>> +      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
> >>>> +       : CODE_FOR_altivec_vmrglw_direct_v4si),
> >>>
> >>> The correct way is to align the ? and the : (or put everything on one
> >>> line of course, if that fits)
> >>>
> >>> The parens around this are not needed btw, and are a distraction.
> >>
> >> Changed.
> >>
> >>>
> >>>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> >>>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> >>>> @@ -317,10 +317,10 @@ int main ()
> >>>>   /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
> >>>> -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
> >>>> +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
> >>>> -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
> >>>> -/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
> >>>> +/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
> >>>> +/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrglh" 8 } } */
> >>>>   /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
> >>>>   /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
> >>>> @@ -347,7 +347,7 @@ int main ()
> >>>>   /* { dg-final { scan-assembler-times "vspltb" 6 } } */
> >>>>   /* { dg-final { scan-assembler-times "vspltw" 0 } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrgow" 8 } } */
> >>>> -/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
> >>>> +/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
> >>>>   /* { dg-final { scan-assembler-times "vmrgew" 8 } } */
> >>>>   /* { dg-final { scan-assembler-times "vsplth" 8 } } */
> >>>
> >>> Are those changes correct?  It looks like a vmrglb became a vmrghb, and
> >>> that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong?
> >>
> >>
> >> This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
> >> and it also counted the generated instruction patterns.
> >>
> >> 1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0",
> >> so it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)
> >>
> >>          li 9,48          # 1282 [c=4 l=4]  *movdi_internal64/3
> >> -       lxvd2x 0,31,9    # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> >> -       xxpermdi 0,0,0,2         # 32   [c=4 l=4]  xxswapd_v4si
> >> -       xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si
> >> +       lxvd2x 12,31,9   # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> >> +       xxpermdi 12,12,12,2      # 32   [c=4 l=4]  xxswapd_v4si
> >> +       xxmrglw 0,12,0   # 33   [c=4 l=4]  altivec_vmrglw_direct_v4si/0
> >>          xxpermdi 0,0,0,2         # 35   [c=4 l=4]  xxswapd_v4sf
> >>
> >> Note that v0 and v12 is swapped in lxvd2x, these new 3 instructions
> >> produces same result than before.
> >>
> >> 2) "*altivec_vmrglb_internal" is replaced by "altivec_vmrghb_direct"
> >> with this patch, then vmrglb count decreases from 5 to 4 and vmrghb
> >> increases from 5 to 6. (BYTES_BIG_ENDIAN is checked early in RTL
> >> generation instead of final to remove the UNSPECs for potential
> >> optimization through backend.)
> >>
> >>          li 9,928                 # 1424 [c=4 l=4]  *movdi_internal64/3
> >>          lxvd2x 0,31,9    # 416  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI
> >> -       xxpermdi 33,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi
> >> +       xxpermdi 32,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi
> >>          li 9,944                 # 1425 [c=4 l=4]  *movdi_internal64/3
> >>          lxvd2x 0,31,9    # 418  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI
> >> -       xxpermdi 32,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi
> >> -       vmrghb 0,0,1     # 420  [c=4 l=4]  *altivec_vmrglb_internal
> >> +       xxpermdi 33,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi
> >> +       vmrghb 0,1,0     # 420  [c=4 l=4]  altivec_vmrghb_direct
> >>          xxpermdi 0,32,32,2       # 421  [c=4 l=4]  xxswapd_v16qi
> >>
> >> Seems not necessary to also use \m and \M here to count only ASM here?
> >> Update the patch as attached.
> >>
> >
>
> --
> Thanks,
> Xionghu