From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 37ECB385840D; Wed, 24 Apr 2024 08:24:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 37ECB385840D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 37ECB385840D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713947086; cv=none; b=cvuex2WSkwSVRizSinq4OEsC830PIR9e9YyJiFivMR8daKV643f1yBPt20m7ZqKnjqWqthdgUaI7IXf3RnGgpfzR8gtFHuwNZWOsO++cPaIzOiqTBW8BK2Ee/2/iUBJGk84y2wfjVdOqqzYajVjrFu5EYa3QDyJWwv3f6675N0k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713947086; c=relaxed/simple; bh=ZFYGjFY/md4DNnJIrqd80T7Gfe40U0oPo7/FQEDbEgU=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=pS4Pw4oHmjJ91IhWEDakXNZHNZ7HdWf5BT4ZlrrFNVTY02zt0bTH7wbAikX50e4Gtt7arJwz3z0FdS2/gYHZ/gWpSCrE1Mhc2k9jO6+lC90Sn/pX1WUjHw2aLxVo1Rsffj/EZ3CUCdM+IeLDgqZUOWN0irOGFAH0vHxsyPe+h7U= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 43O6pMWf012159; Wed, 24 Apr 2024 08:24:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=hOeSY5VwoffTN0QBsNejWLvBH2yKoV2EK9kvUNg9XQI=; b=NXLk7g/5TnoHK/JTbfinMNWJC1HXiBCeE1nOIOx/cRK/DDvG9yY9ZM5/KSp1cpIxai92 qv0aJ0qfsED+Szvt81aEirSdHfCG1eutZAs+7a/J4p+q1U4XwBGJqdiqBle/hVNAVlRA uh60qC2BVIngt2TZSv9U87WfMxQ9Hec2Ae2aA0HYhO4xxIDMlEgzZd9rdTbRjDP8nFYC vWgJ5kYmXRKSCa8C6gLGIx6595jZTv758+xOLSrp9QTACqH+lTr3fWNE4668TTC0UdPv gTTnsJGQ4Ropp/wzMIPfM6JTz5qQ0+42avziyEVcnZevUNQaSrkxc91kMKaVemtRSbh8 9Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xpw1s06nu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Apr 2024 08:24:32 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 43O8OW7G023827; Wed, 24 Apr 2024 08:24:32 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xpw1s06ns-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Apr 2024 08:24:32 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 43O7Q9ou020915; Wed, 24 Apr 2024 08:24:31 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xmre02pv4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Apr 2024 08:24:31 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 43O8ORBd45220176 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Apr 2024 08:24:29 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8364620043; Wed, 24 Apr 2024 08:24:27 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E9FE52004B; Wed, 24 Apr 2024 08:24:24 +0000 (GMT) Received: from [9.197.252.198] (unknown [9.197.252.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 24 Apr 2024 08:24:24 +0000 (GMT) Message-ID: Date: Wed, 24 Apr 2024 16:24:23 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b Content-Language: en-US To: Alexandre Oliva Cc: Rainer Orth , Mike Stump , David Edelsohn , Segher Boessenkool , Kewen Lin , gcc-patches@gcc.gnu.org References: From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ztJxeXupLN6cjdu4CorxkG09C4KWt5JT X-Proofpoint-ORIG-GUID: 3p0wE9JVGa4kNyCyp0ALpu__xDp6e2CT Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-04-24_05,2024-04-23_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=0 bulkscore=0 phishscore=0 adultscore=0 clxscore=1015 malwarescore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404240036 X-Spam-Status: No, score=-14.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, on 2024/4/22 17:28, Alexandre Oliva wrote: > Ping? > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html > > > This test expects vectorization at power8+ because strict alignment is > not required for vectors. For power7, vectorization is not to take > place because it's not deemed profitable: 12 iterations would be > required to make it so. > > But for power6 and below, the test's 10 iterations are enough to make > vectorization profitable, but the test doesn't expect this. Assuming > the decision is indeed appropriate, I'm adjusting the expectations. For a record, the cost difference between power6 and power7 is the cost for vec_perm, it's: * p6 * ic[i_23] 2 times vector_stmt costs 2 in prologue ic[i_23] 1 times vector_stmt costs 1 in prologue ic[i_23] 1 times vector_load costs 2 in body ic[i_23] 1 times vec_perm costs 1 in body vs. * p7 * ic[i_23] 2 times vector_stmt costs 2 in prologue ic[i_23] 1 times vector_stmt costs 1 in prologue ic[i_23] 1 times vector_load costs 2 in body ic[i_23] 1 times vec_perm costs 3 in body , it further cause minimum iters for profitability difference. > > > for gcc/testsuite/ChangeLog > > * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust > expectations for cpus below power7. > --- > .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > index cbbfbb24658f8..0dab2c08acdb4 100644 > --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > @@ -46,9 +46,10 @@ int main (void) > return 0; > } > > -/* Peeling to align the store is used. Overhead of peeling is too high. */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { vector_alignment_reachable && {! vect_no_align} } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */ > +/* Peeling to align the store is used. Overhead of peeling is too high > + for power7, but acceptable for earlier architectures. */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_hw_misalign} } } } } } */ > > /* Versioning to align the store is used. Overhead of versioning is not too high. */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || {! vector_alignment_reachable} } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } } } } */ For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line above) shows the original intention of this case is to expect not profitable for peeling so it's not expected to be handled here, can we just tweak the loop bound instead, such as: -#define N 14 +#define N 13 #define OFF 4 ?, it can make this loop not profitable to be vectorized for !vect_no_align with peeling (both pwr7 and pwr6) and keep consistent. BR, Kewen > >