From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 1E6463858409; Mon, 6 Sep 2021 00:54:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E6463858409 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1860XqNU053628; Sun, 5 Sep 2021 20:54:14 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3avy9cfg57-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 05 Sep 2021 20:54:13 -0400 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1860sDT8105916; Sun, 5 Sep 2021 20:54:13 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3avy9cfg4m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 05 Sep 2021 20:54:13 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1860mRKR012175; Mon, 6 Sep 2021 00:54:11 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 3av0e8vtw6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Sep 2021 00:54:10 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1860s8ex44630292 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Sep 2021 00:54:08 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4710FA4055; Mon, 6 Sep 2021 00:54:08 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 85FFCA4040; Mon, 6 Sep 2021 00:54:06 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.166]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 6 Sep 2021 00:54:06 +0000 (GMT) Subject: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw] To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com, linkw@gcc.gnu.org, dje.gcc@gmail.com References: <20210524090213.2813103-1-luoxhu@linux.ibm.com> <20210608232543.GC18427@gate.crashing.org> <7daea8f2-c0f4-f2e0-eca1-6cfc7496600d@linux.ibm.com> From: Xionghu Luo Message-ID: <7bce16f3-ab02-b26f-57d1-c9da93b9f7b8@linux.ibm.com> Date: Mon, 6 Sep 2021 08:54:06 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 2YWAExf3FODOgSJ3XhmT6tNJ346EJ_Hn X-Proofpoint-ORIG-GUID: NvFemXBdsEzF1Ge9ZkL_n0c9m5pukErK Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-09-05_04:2021-09-03, 2021-09-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 mlxlogscore=999 malwarescore=0 spamscore=0 mlxscore=0 impostorscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2108310000 definitions=main-2109060001 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2021 00:54:16 -0000 Ping^2, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote: > Gentle ping, thanks. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html > > > On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote: >> Hi, >> >> On 2021/6/9 07:25, Segher Boessenkool wrote: >>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote: >>>> vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, >>>> 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for >>>> vmrghlb. >>> >>> (vmrglb) >>> >>>> +  if (BYTES_BIG_ENDIAN) >>>> +    emit_insn ( >>>> +      gen_altivec_vmrghb_direct (operands[0], operands[1], >>>> operands[2])); >>>> +  else >>>> +    emit_insn ( >>>> +      gen_altivec_vmrglb_direct (operands[0], operands[2], >>>> operands[1])); >>> >>> Please don't indent like that, it doesn't match what we do elsewhere. >>> For better or for worse (for worse imo), we use deep hanging indents. >>> If you have to, you can do something like >>> >>>    rtx insn; >>>    if (BYTES_BIG_ENDIAN) >>>      insn = gen_altivec_vmrghb_direct (operands[0], operands[1], >>> operands[2]); >>>    else >>>      insn = gen_altivec_vmrglb_direct (operands[0], operands[2], >>> operands[1]); >>>    emit_insn (insn); >>> >>> (this is better even, in that it has only one emit_insn), or even >>> >>>    rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct >>>                    : gen_altivec_vmrglb_direct; >>>    if (!BYTES_BIG_ENDIAN) >>>      std::swap (operands[1], operands[2]); >>>    emit_insn (fun (operands[0], operands[1], operands[2])); >>> >>> Well, C++ does not allow that last example like that, sigh, so >>>    rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? >>> gen_altivec_vmrghb_direct >>>                         : gen_altivec_vmrglb_direct; >>> >>> This is shorter than the other two options ;-) >> >> Changed. >> >>> >>>> +(define_insn "altivec_vmrghb_direct" >>>>     [(set (match_operand:V16QI 0 "register_operand" "=v") >>>> +    (vec_select:V16QI >>> >>> This should be indented one space more. >>> >>>>     "TARGET_ALTIVEC" >>>>     "@ >>>> -   xxmrghw %x0,%x1,%x2 >>>> -   vmrghw %0,%1,%2" >>>> +  xxmrghw %x0,%x1,%x2 >>>> +  vmrghw %0,%1,%2" >>> >>> The original indent was correct, please restore. >>> >>>> -      emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo)); >>>> +      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, >>>> vo)); >>> >>> When you see a mode as part of a pattern name, chances are that it will >>> be a good candidate for using parameterized names with.  (But don't do >>> that now, just keep it in mind as a nice cleanup to do). >> >> OK. >> >>> >>>> @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target, >>>> rtx op0, rtx op1, >>>>          : CODE_FOR_altivec_vmrglh_direct), >>>>         {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, >>>> 22, 23 } }, >>>>       { OPTION_MASK_ALTIVEC, >>>> -      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct >>>> -       : CODE_FOR_altivec_vmrglw_direct), >>>> +      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si >>>> +       : CODE_FOR_altivec_vmrglw_direct_v4si), >>> >>> The correct way is to align the ? and the : (or put everything on one >>> line of course, if that fits) >>> >>> The parens around this are not needed btw, and are a distraction. >> >> Changed. >> >>> >>>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c >>>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c >>>> @@ -317,10 +317,10 @@ int main () >>>>   /* { dg-final { scan-assembler-times "vctuxs" 2 } } */ >>>>   /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */ >>>> -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */ >>>> +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */ >>>>   /* { dg-final { scan-assembler-times "vmrghh" 8 } } */ >>>> -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */ >>>> -/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */ >>>> +/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */ >>>> +/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */ >>>>   /* { dg-final { scan-assembler-times "vmrglh" 8 } } */ >>>>   /* { dg-final { scan-assembler-times "xxlnor" 6 } } */ >>>>   /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */ >>>> @@ -347,7 +347,7 @@ int main () >>>>   /* { dg-final { scan-assembler-times "vspltb" 6 } } */ >>>>   /* { dg-final { scan-assembler-times "vspltw" 0 } } */ >>>>   /* { dg-final { scan-assembler-times "vmrgow" 8 } } */ >>>> -/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */ >>>> +/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */ >>>>   /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */ >>>>   /* { dg-final { scan-assembler-times "vmrgew" 8 } } */ >>>>   /* { dg-final { scan-assembler-times "vsplth" 8 } } */ >>> >>> Are those changes correct?  It looks like a vmrglb became a vmrghb, and >>> that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong? >> >> >> This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp" >> and it also counted the generated instruction patterns. >> >> 1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0", >> so it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.) >> >>          li 9,48          # 1282 [c=4 l=4]  *movdi_internal64/3 >> -       lxvd2x 0,31,9    # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si >> -       xxpermdi 0,0,0,2         # 32   [c=4 l=4]  xxswapd_v4si >> -       xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si >> +       lxvd2x 12,31,9   # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si >> +       xxpermdi 12,12,12,2      # 32   [c=4 l=4]  xxswapd_v4si >> +       xxmrglw 0,12,0   # 33   [c=4 l=4]  altivec_vmrglw_direct_v4si/0 >>          xxpermdi 0,0,0,2         # 35   [c=4 l=4]  xxswapd_v4sf >> >> Note that v0 and v12 is swapped in lxvd2x, these new 3 instructions >> produces same result than before. >> >> 2) "*altivec_vmrglb_internal" is replaced by "altivec_vmrghb_direct" >> with this patch, then vmrglb count decreases from 5 to 4 and vmrghb >> increases from 5 to 6. (BYTES_BIG_ENDIAN is checked early in RTL >> generation instead of final to remove the UNSPECs for potential >> optimization through backend.) >> >>          li 9,928                 # 1424 [c=4 l=4]  *movdi_internal64/3 >>          lxvd2x 0,31,9    # 416  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI >> -       xxpermdi 33,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi >> +       xxpermdi 32,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi >>          li 9,944                 # 1425 [c=4 l=4]  *movdi_internal64/3 >>          lxvd2x 0,31,9    # 418  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI >> -       xxpermdi 32,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi >> -       vmrghb 0,0,1     # 420  [c=4 l=4]  *altivec_vmrglb_internal >> +       xxpermdi 33,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi >> +       vmrghb 0,1,0     # 420  [c=4 l=4]  altivec_vmrghb_direct >>          xxpermdi 0,32,32,2       # 421  [c=4 l=4]  xxswapd_v16qi >> >> Seems not necessary to also use \m and \M here to count only ASM here? >> Update the patch as attached. >> > -- Thanks, Xionghu