From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B27DC388265E for ; Fri, 30 Jun 2023 22:20:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B27DC388265E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=us.ibm.com Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35UMChwd027927; Fri, 30 Jun 2023 22:20:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : subject : from : to : cc : date : in-reply-to : references : content-type : mime-version : content-transfer-encoding; s=pp1; bh=uZk9+FdD4zh43L+H0Ub3LABJ9pZ2B+a72YPpIgL0RiY=; b=ggpXbksRV1EyizRQaWqjEbFC8OFHmKjcjiRXH5y7JT3lty0NbnSbfeuxh+Y8OIYNm0fy JUBbW2K2LNq8NLtvP0VslHgx3nkaFC2y5UhC06uBOnKu0sOJe4Vszk0Z6iuD8NnYMuAI f+j1sLeg5ZNRBgwHsQajuGj4eKIBNepxqhDwWuX9EJ8JoW98py2fIpMSC9Eb4yteN7D2 WmpcLJmDViUou3nvhy8zSBjbLBjvIJ+c8GPFYkv06+b/80Imeacm76OI1r4DCMqPsTHF doiX/LjzdhrF4VbFy1cL5dqy92xI+uuyTljKEDw194I47mNv6M36RPOMTAT0M9cO3i8h xg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rj7mrr8h3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 22:20:50 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35UMDKu2028676; Fri, 30 Jun 2023 22:20:50 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rj7mrr8gr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 22:20:50 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35UC0LcH014637; Fri, 30 Jun 2023 22:20:49 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([9.208.130.98]) by ppma03dal.us.ibm.com (PPS) with ESMTPS id 3rdr45mk05-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jun 2023 22:20:49 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35UMKlTr66519364 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 30 Jun 2023 22:20:47 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4984C58060; Fri, 30 Jun 2023 22:20:47 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B0A058064; Fri, 30 Jun 2023 22:20:46 +0000 (GMT) Received: from li-e362e14c-2378-11b2-a85c-87d605f3c641.ibm.com (unknown [9.61.18.149]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 30 Jun 2023 22:20:46 +0000 (GMT) Message-ID: Subject: Re: [PATCH] rs6000: Update the vsx-vector-6.* tests. From: Carl Love To: "Kewen.Lin" , cel@us.ibm.com Cc: Peter Bergner , Segher Boessenkool , gcc-patches@gcc.gnu.org, David Edelsohn Date: Fri, 30 Jun 2023 15:20:45 -0700 In-Reply-To: References: <5197d0d8ab5e975ed7e1e928820769e5921f5796.camel@us.ibm.com> <621ac0734ae83c7ca6af00d804a3d3bc2bbbea5b.camel@us.ibm.com> <3f8d0bdc-bddf-d178-ee76-8d41c4b8755f@linux.ibm.com> <4d3135956e493fc12311af8fce18d043769ab7a4.camel@us.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: UrZesdiBPa7HiCvr2YvGiuHeswMmOHp9 X-Proofpoint-ORIG-GUID: 8aAdbIRBTrsRjBw29aH0YpHM1OZhrXvi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-30_12,2023-06-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 mlxlogscore=999 spamscore=0 impostorscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 phishscore=0 mlxscore=0 adultscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306300193 X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Kewen: On Fri, 2023-06-30 at 11:37 +0800, Kewen.Lin wrote: > Hi Carl, > > on 2023/6/30 05:36, Carl Love wrote: > > Kewen: > > > > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote: > > > > Yea, I was going with a runnable test and didn't include the > > > > instruction counts. Added back in. Rather then doing by > > > > processor > > > > version (P8, P9, P10) I was able to do it by BE/LE. The > > > > instruction > > > > counts were the same for LE accross processor versions but > > > > there > > > > are a > > > > few instruction counts that vary with BE and LE. > > > > > > But the original test case only checks for cpu-types (processor > > > version) > > > but not for endianness, it means for the bif usages, there should > > > not > > > be > > > different for endianness. Why does this changes with your new > > > test > > > case? > > > Could you have a further look and make it consistent with some > > > adjustment > > > if possible? As we know, checking insn counts sometimes are > > > fragile, > > > so > > > I think we should try our best to make it as robust as possible > > > in > > > the > > > first place. > > > > > > Besides, the original case also have some differences between > > > p7/p8 > > > and > > > p9. > > > > > > > There are differences on P8 LE versus BE. I did a diff between the > > P8 > > and P9 tests: > > > > diff vsx-vector-6.p8.c vsx-vector-6.p9.c > > 3,4c3,4 > > < /* { dg-require-effective-target powerpc_p8vector_ok } */ > > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > > --- > > > /* { dg-require-effective-target powerpc_p9vector_ok } */ > > > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > > 12c12 > > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */ > > --- > > > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } > > > */ > > 23d22 > > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > > 37c36 > > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */ > > --- > > > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > > > > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp, > > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are > > different > > between the two architectures. I then wrote a script to compile > > the > > CPU specific test on Power 8, Power 9 and Power 10 architectures > > and > > then grep for the above list of instructions. If I run the scrip > > on P8 > > BE and LE I get> > > > > Power 8 BE Power 8 LE Power 9 LE Power 9 > > BE Power 10 LE* > > (makalu- > > lp1) (genoa) (marlin) (nilram) (ltcd97-lp3) > > instruction count count count count > > count > > vperm 1 1 0 0 > > 0 > > vpermr 0 0 0 0 > > 0 > > xxpermr 0 0 1 0 > > 1 > > xvmsubadp 1 0 1 1 > > 1 > > xvmsubmdp 0 1 0 0 > > 0 > > xvsubdp 1 1 1 1 > > 1 > > > > Thanks for looking into this and making this statistics. > > Is there a typo for column nilram? Otherwise, the below insn check > > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ > > would fail there. Yes, there is a typo in the nilram column. The test generates a vperm instruction. #if defined (__BIG_ENDIAN__) || defined (_ARCH_PWR9) dst[8].d = vec_perm (src0[8].d, src1[8].d, src2[8].uc); f74: e9 3f 00 78 ld r9,120(r31) f78: 39 29 07 00 addi r9,r9,1792 f7c: f5 89 00 01 lxv vs12,0(r9) f80: e9 3f 00 80 ld r9,128(r31) f84: 39 29 07 00 addi r9,r9,1792 f88: f4 09 00 01 lxv vs0,0(r9) f8c: e9 3f 00 88 ld r9,136(r31) f90: 39 29 07 00 addi r9,r9,1792 f94: f4 09 00 89 lxv vs32,128(r9) f98: e9 3f 00 70 ld r9,112(r31) f9c: 39 29 07 00 addi r9,r9,1792 fa0: f0 2c 64 91 xxmr vs33,vs12 fa4: f1 a0 04 91 xxmr vs45,vs0 fa8: 10 01 68 2b vperm v0,v1,v13,v0 ... > > > > > I had played with putting -Wno-inline on the command line but that > > didn't seem to make any difference. However, you suggestion of > > __attribute__ ((noipa)) does prevent the inlining and we don't get > > the > > second copy of the instructions showing up. The inlining eliminated > > the > > LE/BE differences for xvmaxsp, xvminsp and xvmaxdp. > > -Winline is a option for warning: "Warn if a function that is > declared > as inline cannot be inlined.", I think what you wanted is -fno- > inline, > and it's good to know noipa helps here. Yea, my bad. Didn't read the manual very carefully. > > > The instruction count test for xxlor in vsx-vector-6-func-2lop.c > > differs on LE and BE vsx-vector-6-func-2op.c. I believe the > > instruction is used with loads to reorder the data. I don't see > > anyway > > to get around the extra xxlor instructions and verify the vec_or > > builtin test generates the instruction. > > > > OK, I'm still curious how the loads cause the difference. Yea, looks like there is something screwy going on. So, I started by running the test: make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vsx- vector-6-func-2lop.c " > out on Makalu, P8 BE and verified the test gives 7 passes and no failures. on genoa, P8 LE, I also verified the test gives 7 passes and no failures. Then I went in an intentionally changed the expected counts down by one for each platform. The idea was to verify that the dg-final { scan- assembler-times {\mxxlor\M} was being called. on Makalu, I now get an error, as expected: heck_cached_effective_target be: returning 1 for unix is-effective-target: be 1 <<<< NOTE BE gcc.target/powerpc/vsx-vector-6-func-2lop.c: \\mxxlor\\M found 32 times FAIL: gcc.target/powerpc/vsx-vector-6-func-2lop.c scan-assembler-times \\mxxlor\\M 31 on Genoa, I now get the error, as expected: check_cached_effective_target le: returning 1 for unix is-effective-target: le 1 gcc.target/powerpc/vsx-vector-6-func-2lop.c: \\mxxlor\\M found 22 times FAIL: gcc.target/powerpc/vsx-vector-6-func-2lop.c scan-assembler-times \\mxxlor\\M 21 So, running the tests, gcc definitely thinks there should be 22 xxlor on LE and 32 on BE. So, went to look at the assembly to verify my comment on the difference being related to the loads. I decided to actually count the instructions just to verify the number in the assembly files. Before, I just looked at the assembly briefly but didn't dig in very deep. If I compile the tests and dump the assembly with: gcc -g -mcpu=power8 -o vsx-vector-6-func-2lop vsx-vector-6-func- 2lop.c objdump -S -d vsx-vector-6-func-2lop > vsx-vector-6-func-2lop.dump grep xxlor vsx-vector-6-func-2lop.dump | wc 4 28 192 So we see 4 xxlor instructions not 32 as expeced for BE or 22 as expected for LE as the test claims. I get the same count of 4 on both makalu and on genoa. I like this approach because you can easily see the relationship of the source and assembly. So, there seems to be something screwy here as that is not even close to what the make script /scan-assemblerthinks the counts should be. Segher never liked the above way of looking at the assembly. He prefers: gcc -S -g -mcpu=power8 -o vsx-vector-6-func-2lop.s vsx-vector-6-func- 2lop.c grep xxlor vsx-vector-6-func-2lop.s | wc 34 68 516 So, again, I get the same count of 34 on both makalu and genoa. But again, that doesn't agree with what make script/scan-assembler thinks the counts should be. When I looked at the vsx-vector-6-func-2lop.s I see on BE: .... lxvd2x 0,10,9 xxlor 0,12,0 xxlnor 0,0,0 ... I was guessing that it was adjusting the data layout from the load. But looking again more carefully versus LE: .... lxvd2x 0,31,9 xxpermdi 0,0,0,2 xxlor 0,12,0 xxlnor 0,0,0 xxpermdi 0,0,0,2 .... the xxpermdi is probably what is really doing the data layout change. So, we have the issue that looking at the assembly gives different instruction counts then what dg-final { scan-assembler-times {\mxxlor\M} } comes up with??? Now I am really confused. I don't know how the scan- assembler-times works but I will go see if I can find it and see if I can figure out what the issue is. I would expect that the scan- assembler is working off the --save-temp files, which get deleted as part of the run. I would guess that scan-assembler does a grep to find the instructions and then maybe uses wc to count them??? I will go see if I can figure out how scan-assembler-times works. Carl