From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luoxhu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 17EFE3865474;
 Wed, 30 Jun 2021 01:47:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 17EFE3865474
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 15U1XEsK055913; Tue, 29 Jun 2021 21:47:36 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0b-001b2d01.pphosted.com with ESMTP id 39g7hyahe4-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 29 Jun 2021 21:47:36 -0400
Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 15U1XMWZ056261;
 Tue, 29 Jun 2021 21:47:36 -0400
Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.102])
 by mx0b-001b2d01.pphosted.com with ESMTP id 39g7hyahdn-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 29 Jun 2021 21:47:36 -0400
Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1])
 by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 15U1inxR014660;
 Wed, 30 Jun 2021 01:47:34 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma06ams.nl.ibm.com with ESMTP id 39g91yr369-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 30 Jun 2021 01:47:34 +0000
Received: from b06wcsmtp001.portsmouth.uk.ibm.com
 (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 15U1lWl726083728
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 30 Jun 2021 01:47:32 GMT
Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 1F688A405C;
 Wed, 30 Jun 2021 01:47:32 +0000 (GMT)
Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 1A240A4054;
 Wed, 30 Jun 2021 01:47:30 +0000 (GMT)
Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.117])
 by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS;
 Wed, 30 Jun 2021 01:47:29 +0000 (GMT)
Subject: Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: wschmidt@linux.ibm.com, gcc-patches@gcc.gnu.org, linkw@gcc.gnu.org,
 dje.gcc@gmail.com
References: <20210524090213.2813103-1-luoxhu@linux.ibm.com>
 <20210608232543.GC18427@gate.crashing.org>
 <7daea8f2-c0f4-f2e0-eca1-6cfc7496600d@linux.ibm.com>
From: Xionghu Luo <luoxhu@linux.ibm.com>
Message-ID: <fe8441fa-3ba9-e427-3377-bc7d8ff44a12@linux.ibm.com>
Date: Wed, 30 Jun 2021 09:47:27 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0)
 Gecko/20100101 Thunderbird/68.12.0
In-Reply-To: <7daea8f2-c0f4-f2e0-eca1-6cfc7496600d@linux.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: ErfImroyWz4dpBKI6-U4JMlkoQ0kyLQk
X-Proofpoint-GUID: EAXtg9WJXhcbdIhEW5x-z5xzD0chK7uE
Content-Transfer-Encoding: 8bit
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-06-29_14:2021-06-29,
 2021-06-29 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 clxscore=1015
 lowpriorityscore=0 bulkscore=0 malwarescore=0 adultscore=0
 priorityscore=1501 mlxlogscore=999 spamscore=0 impostorscore=0
 suspectscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2104190000 definitions=main-2106300009
X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jun 2021 01:47:38 -0000

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html


On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
> Hi,
> 
> On 2021/6/9 07:25, Segher Boessenkool wrote:
>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
>>> vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
>>> 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for 
>>> vmrghlb.
>>
>> (vmrglb)
>>
>>> +  if (BYTES_BIG_ENDIAN)
>>> +    emit_insn (
>>> +      gen_altivec_vmrghb_direct (operands[0], operands[1], 
>>> operands[2]));
>>> +  else
>>> +    emit_insn (
>>> +      gen_altivec_vmrglb_direct (operands[0], operands[2], 
>>> operands[1]));
>>
>> Please don't indent like that, it doesn't match what we do elsewhere.
>> For better or for worse (for worse imo), we use deep hanging indents.
>> If you have to, you can do something like
>>
>>    rtx insn;
>>    if (BYTES_BIG_ENDIAN)
>>      insn = gen_altivec_vmrghb_direct (operands[0], operands[1], 
>> operands[2]);
>>    else
>>      insn = gen_altivec_vmrglb_direct (operands[0], operands[2], 
>> operands[1]);
>>    emit_insn (insn);
>>
>> (this is better even, in that it has only one emit_insn), or even
>>
>>    rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
>>                    : gen_altivec_vmrglb_direct;
>>    if (!BYTES_BIG_ENDIAN)
>>      std::swap (operands[1], operands[2]);
>>    emit_insn (fun (operands[0], operands[1], operands[2]));
>>
>> Well, C++ does not allow that last example like that, sigh, so
>>    rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? 
>> gen_altivec_vmrghb_direct
>>                         : gen_altivec_vmrglb_direct;
>>
>> This is shorter than the other two options ;-)
> 
> Changed.
> 
>>
>>> +(define_insn "altivec_vmrghb_direct"
>>>     [(set (match_operand:V16QI 0 "register_operand" "=v")
>>> +    (vec_select:V16QI
>>
>> This should be indented one space more.
>>
>>>     "TARGET_ALTIVEC"
>>>     "@
>>> -   xxmrghw %x0,%x1,%x2
>>> -   vmrghw %0,%1,%2"
>>> +  xxmrghw %x0,%x1,%x2
>>> +  vmrghw %0,%1,%2"
>>
>> The original indent was correct, please restore.
>>
>>> -      emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
>>> +      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
>>
>> When you see a mode as part of a pattern name, chances are that it will
>> be a good candidate for using parameterized names with.  (But don't do
>> that now, just keep it in mind as a nice cleanup to do).
> 
> OK.
> 
>>
>>> @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target, 
>>> rtx op0, rtx op1,
>>>          : CODE_FOR_altivec_vmrglh_direct),
>>>         {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 
>>> 23 } },
>>>       { OPTION_MASK_ALTIVEC,
>>> -      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
>>> -       : CODE_FOR_altivec_vmrglw_direct),
>>> +      (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>>> +       : CODE_FOR_altivec_vmrglw_direct_v4si),
>>
>> The correct way is to align the ? and the : (or put everything on one
>> line of course, if that fits)
>>
>> The parens around this are not needed btw, and are a distraction.
> 
> Changed.
> 
>>
>>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>>> @@ -317,10 +317,10 @@ int main ()
>>>   /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
>>>   /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
>>> -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
>>> +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
>>>   /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
>>> -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
>>> -/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
>>> +/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
>>> +/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
>>>   /* { dg-final { scan-assembler-times "vmrglh" 8 } } */
>>>   /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
>>>   /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
>>> @@ -347,7 +347,7 @@ int main ()
>>>   /* { dg-final { scan-assembler-times "vspltb" 6 } } */
>>>   /* { dg-final { scan-assembler-times "vspltw" 0 } } */
>>>   /* { dg-final { scan-assembler-times "vmrgow" 8 } } */
>>> -/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
>>> +/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
>>>   /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
>>>   /* { dg-final { scan-assembler-times "vmrgew" 8 } } */
>>>   /* { dg-final { scan-assembler-times "vsplth" 8 } } */
>>
>> Are those changes correct?  It looks like a vmrglb became a vmrghb, and
>> that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong?
> 
> 
> This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
> and it also counted the generated instruction patterns.
> 
> 1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0", so 
> it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)
> 
>          li 9,48          # 1282 [c=4 l=4]  *movdi_internal64/3
> -       lxvd2x 0,31,9    # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> -       xxpermdi 0,0,0,2         # 32   [c=4 l=4]  xxswapd_v4si
> -       xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si
> +       lxvd2x 12,31,9   # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> +       xxpermdi 12,12,12,2      # 32   [c=4 l=4]  xxswapd_v4si
> +       xxmrglw 0,12,0   # 33   [c=4 l=4]  altivec_vmrglw_direct_v4si/0
>          xxpermdi 0,0,0,2         # 35   [c=4 l=4]  xxswapd_v4sf
> 
> Note that v0 and v12 is swapped in lxvd2x, these new 3 instructions
> produces same result than before.
> 
> 2) "*altivec_vmrglb_internal" is replaced by "altivec_vmrghb_direct" 
> with this patch, then vmrglb count decreases from 5 to 4 and vmrghb
> increases from 5 to 6. (BYTES_BIG_ENDIAN is checked early in RTL 
> generation instead of final to remove the UNSPECs for potential 
> optimization through backend.)
> 
>          li 9,928                 # 1424 [c=4 l=4]  *movdi_internal64/3
>          lxvd2x 0,31,9    # 416  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI
> -       xxpermdi 33,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi
> +       xxpermdi 32,0,0,2        # 417  [c=4 l=4]  xxswapd_v16qi
>          li 9,944                 # 1425 [c=4 l=4]  *movdi_internal64/3
>          lxvd2x 0,31,9    # 418  [c=8 l=4]  *vsx_lxvd2x16_le_V16QI
> -       xxpermdi 32,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi
> -       vmrghb 0,0,1     # 420  [c=4 l=4]  *altivec_vmrglb_internal
> +       xxpermdi 33,0,0,2        # 419  [c=4 l=4]  xxswapd_v16qi
> +       vmrghb 0,1,0     # 420  [c=4 l=4]  altivec_vmrghb_direct
>          xxpermdi 0,32,32,2       # 421  [c=4 l=4]  xxswapd_v16qi
> 
> Seems not necessary to also use \m and \M here to count only ASM here?
> Update the patch as attached.
> 

-- 
Thanks,
Xionghu