From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id D6E693858D35 for ; Fri, 30 Jun 2023 03:37:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D6E693858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35U3Wu3M022348; Fri, 30 Jun 2023 03:37:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=ltuhQdFpOsN8SDDY2JpkshBztfhw878BCoKciLAyJYw=; b=TJFwqR8gEWK/QL/7yN26R9aEqW8NWnwtXOXPTe2LVfYJP9dKBWWqybGnsqfpdD37lJvA jaxZC5aBWDOAX5Ze5uk9ZMqtcLIkuNqdjOufOi2TOPyQZc4GtO2tRm2fn38/hi0XwlwE gvTqeJCipUQpUdI74lZzJiE05jGD1oKkHQvIK+WxW6DL9Mm2m1Lz7c4JUkt+tbyDtoPw AEDjSWOM4eFkKm6SrgrmlgOfoRm2Xuh1s/FHNR0NForjfCEnfenW/5ay7CCP6mQwVZMv xJQ167MQWyEq98gKdXmmaprX9aCDPxRc6siTex/x3RTD5D9bQ7E5Fmqv2Bg/DnfUf4Mq FQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhq8103sp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 03:37:29 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35U3ZccV029127; Fri, 30 Jun 2023 03:37:29 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhq8103rd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 03:37:29 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35U2rINA017419; Fri, 30 Jun 2023 03:37:27 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma05fra.de.ibm.com (PPS) with ESMTPS id 3rdr45avvc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 03:37:27 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35U3bPup38142418 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 30 Jun 2023 03:37:25 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 352D420043; Fri, 30 Jun 2023 03:37:25 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 298EB20040; Fri, 30 Jun 2023 03:37:23 +0000 (GMT) Received: from [9.200.56.250] (unknown [9.200.56.250]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 30 Jun 2023 03:37:22 +0000 (GMT) Message-ID: Date: Fri, 30 Jun 2023 11:37:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] rs6000: Update the vsx-vector-6.* tests. Content-Language: en-US To: Carl Love Cc: Peter Bergner , Segher Boessenkool , gcc-patches@gcc.gnu.org, David Edelsohn References: <5197d0d8ab5e975ed7e1e928820769e5921f5796.camel@us.ibm.com> <621ac0734ae83c7ca6af00d804a3d3bc2bbbea5b.camel@us.ibm.com> <3f8d0bdc-bddf-d178-ee76-8d41c4b8755f@linux.ibm.com> <4d3135956e493fc12311af8fce18d043769ab7a4.camel@us.ibm.com> From: "Kewen.Lin" In-Reply-To: <4d3135956e493fc12311af8fce18d043769ab7a4.camel@us.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: fl2TsqYRUTzZ4mpRWurO9Hw-FfE3l0zp X-Proofpoint-GUID: jtXqf7doduUTcNXNEi9nhCckXUut1k0a X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-29_10,2023-06-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 priorityscore=1501 spamscore=0 mlxscore=0 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306300029 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Carl, on 2023/6/30 05:36, Carl Love wrote: > Kewen: > > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote: >>> Yea, I was going with a runnable test and didn't include the >>> instruction counts. Added back in. Rather then doing by processor >>> version (P8, P9, P10) I was able to do it by BE/LE. The >>> instruction >>> counts were the same for LE accross processor versions but there >>> are a >>> few instruction counts that vary with BE and LE. >> >> But the original test case only checks for cpu-types (processor >> version) >> but not for endianness, it means for the bif usages, there should not >> be >> different for endianness. Why does this changes with your new test >> case? >> Could you have a further look and make it consistent with some >> adjustment >> if possible? As we know, checking insn counts sometimes are fragile, >> so >> I think we should try our best to make it as robust as possible in >> the >> first place. >> >> Besides, the original case also have some differences between p7/p8 >> and >> p9. >> > > There are differences on P8 LE versus BE. I did a diff between the P8 > and P9 tests: > > diff vsx-vector-6.p8.c vsx-vector-6.p9.c > 3,4c3,4 > < /* { dg-require-effective-target powerpc_p8vector_ok } */ > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > --- >> /* { dg-require-effective-target powerpc_p9vector_ok } */ >> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > 12c12 > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */ > --- >> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ > 23d22 > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > 37c36 > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */ > --- >> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp, > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different > between the two architectures. I then wrote a script to compile the > CPU specific test on Power 8, Power 9 and Power 10 architectures and > then grep for the above list of instructions. If I run the scrip on P8 > BE and LE I get> > > Power 8 BE Power 8 LE Power 9 LE Power 9 BE Power 10 LE* > (makalu-lp1) (genoa) (marlin) (nilram) (ltcd97-lp3) > instruction count count count count count > vperm 1 1 0 0 0 > vpermr 0 0 0 0 0 > xxpermr 0 0 1 0 1 > xvmsubadp 1 0 1 1 1 > xvmsubmdp 0 1 0 0 0 > xvsubdp 1 1 1 1 1 > Thanks for looking into this and making this statistics. Is there a typo for column nilram? Otherwise, the below insn check /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ would fail there. > > From the diff we see > > { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } } > > This test picks up the correct subtraction instruction for LE versus BE > so this "masks" the LE/BE difference. I changed the check in vsx- > vector-6-func-3op.c to match. This eliminates the LE and BE checks and > reduces the number of specific checks. OK, nice. > > In vsx-vector-6-func-3op.c The new test checks the counts for > xxpermdi, which the original test does not check. The check for > xxpermdi are not needed. They are not directly related to the builtin > tests. I removed them. OK. > > Looking at the LE/BE checks in the other test file vsx-vector-6-func- > 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in > the original test. The functions where these instructions are used get > inlined. On LE, the binary instructions show up in the inlined code as > well as what appears to be the binary for the original, non-inlined > function. Best I can see, the binary for the original function is dead > code. I don't see any calls to it. Seems like it shouldn't be there > as it would make the binary smaller. On BE, I don't see the binary for > the original non-inlined function. > > I had played with putting -Wno-inline on the command line but that > didn't seem to make any difference. However, you suggestion of > __attribute__ ((noipa)) does prevent the inlining and we don't get the > second copy of the instructions showing up. The inlining eliminated the > LE/BE differences for xvmaxsp, xvminsp and xvmaxdp. -Winline is a option for warning: "Warn if a function that is declared as inline cannot be inlined.", I think what you wanted is -fno-inline, and it's good to know noipa helps here. > > The instruction count test for xxlor in vsx-vector-6-func-2lop.c > differs on LE and BE vsx-vector-6-func-2op.c. I believe the > instruction is used with loads to reorder the data. I don't see anyway > to get around the extra xxlor instructions and verify the vec_or > builtin test generates the instruction. > OK, I'm still curious how the loads cause the difference. > I was able to eliminate all of the LE/BE qualifiers in the instruction > counts with the exception of xxlor. By using the same checks that look > for multiple versions of xvmsumb*, as was done in the original test, we > can also eliminate LE/BE specific tests and account for different > instructions across CPU versions. We could go back to checking for > specific instructions being generated on Power 8, Power 9, Power 10 if > you prefer not using checks that cover multiple flavors of a given > instruction across different CPU types. > > FYI, I eliminated the function call to do the various tests. Instead, > I modified the macro to generate a function call to do the test and > check the results. Got it, I'll review the latest revision soon, thanks. BR, Kewen