From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 421C83858D37 for ; Thu, 29 Jun 2023 21:36:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 421C83858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=us.ibm.com Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35TLHZS5018557; Thu, 29 Jun 2023 21:36:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : subject : from : to : cc : date : in-reply-to : references : content-type : mime-version : content-transfer-encoding; s=pp1; bh=vTBaLGTc6tgOIm4kLuzdi/3rxNwRrEs5SCp8GNHocr4=; b=GOssJDP/Cv3yhbzM46aYrz7h/Ypg9xlnGlskwNVtHNyMZS9SL4lQsnCCKKKWD5cnfEt0 CgZnaPDuHZh+fAlLFbqcU+/6AvMFqKUf6hioLbHtHbJJR6a22P73PATcwuNtedGCUctx KvW2soUd1OLRpIWsv+1yTZYEyGCQjgmWpBaHR9AyzRpMIcmpEU/cZjC/fL2rPDGFCnZc w7lrPnboxDpH3ZthdVF8DugNwvJwn1NyA4iFtCu0dfM23CVI7PtQoCYcUhESy2vvi7ts 2Qb4ZOgC1bIyG0CzJCskzFIUJ2OqdTitdO1Uw/VmHEDSyDbYS6rmYk78Ue3S5Xmc7lZm sQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhhrbgdq6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 21:36:07 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35TLRKNZ001582; Thu, 29 Jun 2023 21:36:06 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhhrbgdp1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 21:36:06 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35TKfep4008210; Thu, 29 Jun 2023 21:36:05 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([9.208.129.119]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3rdr46mq6a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 21:36:05 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35TLa3Mg34275944 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Jun 2023 21:36:04 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD43358043; Thu, 29 Jun 2023 21:36:03 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C3375805D; Thu, 29 Jun 2023 21:36:03 +0000 (GMT) Received: from li-e362e14c-2378-11b2-a85c-87d605f3c641.ibm.com (unknown [9.61.18.149]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTP; Thu, 29 Jun 2023 21:36:03 +0000 (GMT) Message-ID: <4d3135956e493fc12311af8fce18d043769ab7a4.camel@us.ibm.com> Subject: Re: [PATCH] rs6000: Update the vsx-vector-6.* tests. From: Carl Love To: "Kewen.Lin" Cc: Peter Bergner , Segher Boessenkool , gcc-patches@gcc.gnu.org, David Edelsohn Date: Thu, 29 Jun 2023 14:36:02 -0700 In-Reply-To: <3f8d0bdc-bddf-d178-ee76-8d41c4b8755f@linux.ibm.com> References: <5197d0d8ab5e975ed7e1e928820769e5921f5796.camel@us.ibm.com> <621ac0734ae83c7ca6af00d804a3d3bc2bbbea5b.camel@us.ibm.com> <3f8d0bdc-bddf-d178-ee76-8d41c4b8755f@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: wXwylNQenc9zvL1oiDVRxP7UAle5ZYdv X-Proofpoint-ORIG-GUID: -Cwqj6_rjW1B65ZGqrWb48MXNURvoE6S X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-29_08,2023-06-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=553 clxscore=1015 phishscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 spamscore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 impostorscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306290195 X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Kewen: On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote: > > Yea, I was going with a runnable test and didn't include the > > instruction counts. Added back in. Rather then doing by processor > > version (P8, P9, P10) I was able to do it by BE/LE. The > > instruction > > counts were the same for LE accross processor versions but there > > are a > > few instruction counts that vary with BE and LE. > > But the original test case only checks for cpu-types (processor > version) > but not for endianness, it means for the bif usages, there should not > be > different for endianness. Why does this changes with your new test > case? > Could you have a further look and make it consistent with some > adjustment > if possible? As we know, checking insn counts sometimes are fragile, > so > I think we should try our best to make it as robust as possible in > the > first place. > > Besides, the original case also have some differences between p7/p8 > and > p9. > There are differences on P8 LE versus BE. I did a diff between the P8 and P9 tests: diff vsx-vector-6.p8.c vsx-vector-6.p9.c 3,4c3,4 < /* { dg-require-effective-target powerpc_p8vector_ok } */ < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */ --- > /* { dg-require-effective-target powerpc_p9vector_ok } */ > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */ 12c12 < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */ --- > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ 23d22 < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ 37c36 < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */ --- > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp, xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different between the two architectures. I then wrote a script to compile the CPU specific test on Power 8, Power 9 and Power 10 architectures and then grep for the above list of instructions. If I run the scrip on P8 BE and LE I get Power 8 BE Power 8 LE Power 9 LE Power 9 BE Power 10 LE* (makalu-lp1) (genoa) (marlin) (nilram) (ltcd97-lp3) instruction count count count count count vperm 1 1 0 0 0 vpermr 0 0 0 0 0 xxpermr 0 0 1 0 1 xvmsubadp 1 0 1 1 1 xvmsubmdp 0 1 0 0 0 xvsubdp 1 1 1 1 1 >From the diff we see { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } } This test picks up the correct subtraction instruction for LE versus BE so this "masks" the LE/BE difference. I changed the check in vsx- vector-6-func-3op.c to match. This eliminates the LE and BE checks and reduces the number of specific checks. In vsx-vector-6-func-3op.c The new test checks the counts for xxpermdi, which the original test does not check. The check for xxpermdi are not needed. They are not directly related to the builtin tests. I removed them. Looking at the LE/BE checks in the other test file vsx-vector-6-func- 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in the original test. The functions where these instructions are used get inlined. On LE, the binary instructions show up in the inlined code as well as what appears to be the binary for the original, non-inlined function. Best I can see, the binary for the original function is dead code. I don't see any calls to it. Seems like it shouldn't be there as it would make the binary smaller. On BE, I don't see the binary for the original non-inlined function. I had played with putting -Wno-inline on the command line but that didn't seem to make any difference. However, you suggestion of __attribute__ ((noipa)) does prevent the inlining and we don't get the second copy of the instructions showing up. The inlining eliminated the LE/BE differences for xvmaxsp, xvminsp and xvmaxdp. The instruction count test for xxlor in vsx-vector-6-func-2lop.c differs on LE and BE vsx-vector-6-func-2op.c. I believe the instruction is used with loads to reorder the data. I don't see anyway to get around the extra xxlor instructions and verify the vec_or builtin test generates the instruction. I was able to eliminate all of the LE/BE qualifiers in the instruction counts with the exception of xxlor. By using the same checks that look for multiple versions of xvmsumb*, as was done in the original test, we can also eliminate LE/BE specific tests and account for different instructions across CPU versions. We could go back to checking for specific instructions being generated on Power 8, Power 9, Power 10 if you prefer not using checks that cover multiple flavors of a given instruction across different CPU types. FYI, I eliminated the function call to do the various tests. Instead, I modified the macro to generate a function call to do the test and check the results. Carl