From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D42F4385AFBD for ; Fri, 28 Jul 2023 09:00:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D42F4385AFBD Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36S8ugmm011096; Fri, 28 Jul 2023 09:00:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : from : subject : to : cc : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=V1CQb3ytnTXpChgv04QcPmuL4ryTFbh/3xjkeAOuFWw=; b=MHAJbgFntCQsQD7aP/z0cYTa5plmwxdKatq3GveCITBbwdFU2FJQkG9mDV8dTClH4Tue S+NDte15vkXzIhoM5O5uiBIfUBtINS7HzMXFarOBXF2Wi7huYur72Ha/UbSyR5iqah4w 7kQnrFFwUnothEGZ1bHGitEDm/aOYjhbEk0w9qBsEonmsI8bIjIGbETZhZtSHxRb2WOm ZFm7Ypt0vlgwXCZ8TvFWfjCoRvDXEMxi9AXQS+mMVf/5j9OHQzLRJZ+fSgDzf3lW1iyc semk331AJgK4wNiJ/R8PPtn4AVN5JnNeVPq9zOAi9MF6b6GKtVUMXO661v4l1ZtblaBb ew== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s4afd097e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jul 2023 09:00:06 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36S8ksgH010963; Fri, 28 Jul 2023 09:00:05 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s4afd0966-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jul 2023 09:00:05 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36S8mHbp014370; Fri, 28 Jul 2023 09:00:04 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3s0stymvw5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jul 2023 09:00:04 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36S9000M16253562 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Jul 2023 09:00:01 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D1B2E2004D; Fri, 28 Jul 2023 09:00:00 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 909A520040; Fri, 28 Jul 2023 08:59:58 +0000 (GMT) Received: from [9.177.64.55] (unknown [9.177.64.55]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 28 Jul 2023 08:59:58 +0000 (GMT) Message-ID: Date: Fri, 28 Jul 2023 16:59:56 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 From: "Kewen.Lin" Subject: Re: [PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293) To: Michael Meissner Cc: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner References: Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: wArWxhNbXaRLjhyRahs--C8qlMWn0pSk X-Proofpoint-GUID: V9wEDgWq2-cBx8pQZ_h9X_uzkQXVkDws X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-27_10,2023-07-26_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=850 clxscore=1015 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 malwarescore=0 impostorscore=0 mlxscore=0 adultscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307280077 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Mike, on 2023/7/11 03:50, Michael Meissner wrote: > This patch optimizes cases like: > > vector double v1, v2; > /* ... */ > v2 = vec_splats (vec_extract (v1, 0); /* or */ > v2 = vec_splats (vec_extract (v1, 1); > > Previously: > > vector long long > splat_dup_l_0 (vector long long v) > { > return __builtin_vec_splats (__builtin_vec_extract (v, 0)); > } > > would generate: > > mfvsrld 9,34 > mtvsrdd 34,9,9 > blr > > With this patch, GCC generates: > > xxpermdi 34,34,34,3 > blr > > 2023-07-10 Michael Meissner > > gcc/ > > PR target/99293 > * gcc/config/rs6000/vsx.md (vsx_splat_extract_): New combiner > insn. > > gcc/testsuite/ > > PR target/108958 > * gcc.target/powerpc/pr99293.c: New test. > * gcc.target/powerpc/builtins-1.c: Update insn count. > --- > gcc/config/rs6000/vsx.md | 18 ++++++ > gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr99293.c | 55 +++++++++++++++++++ > 3 files changed, 74 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 0c269e4e8d9..d34c3b21abe 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4600,6 +4600,24 @@ (define_insn "vsx_splat__mem" > "lxvdsx %x0,%y1" > [(set_attr "type" "vecload")]) > > +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element > +(define_insn "*vsx_splat_extract_" > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > + (vec_duplicate:VSX_D > + (vec_select: > + (match_operand:VSX_D 1 "vsx_register_operand" "wa") > + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] > + "VECTOR_MEM_VSX_P (mode)" > +{ > + int which_word = INTVAL (operands[2]); > + if (!BYTES_BIG_ENDIAN) > + which_word = 1 - which_word; > + > + operands[3] = GEN_INT (which_word ? 3 : 0); > + return "xxpermdi %x0,%x1,%x1,%3"; > +} > + [(set_attr "type" "vecperm")]) > + > ;; V4SI splat support > (define_insn "vsx_splat_v4si" > [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa") > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c b/gcc/testsuite/gcc.target/powerpc/builtins-1.c > index 28cd1aa6b1a..98783668bce 100644 > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c > @@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa) > /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */ > /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */ > /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */ > -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */ > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c > new file mode 100644 > index 00000000000..e5f44bd7346 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c > @@ -0,0 +1,55 @@ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-O2 -mpower8-vector" } */ Nit: IMHO -mdejagnu-cpu=power8 is preferred against -mpower8-vector which is considered as a workaround option, and we plan to make it go away. > + > +/* Test for PR 99263, which wants to do: > + __builtin_vec_splats (__builtin_vec_extract (v, n)) Nit: Maybe remove all "__builtin_" prefixes since vec_splats and vec_extract are defined in PVIPR without __builtin_. This is also applied for the others below. > + > + where v is a V2DF or V2DI vector and n is either 0 or 1. Previously the GCC > + compiler would do a direct move to the GPR registers to select the item and a > + direct move from the GPR registers to do the splat. > + > + Before the patch, splat_dup_ll_0 or splat_dup_dbl_0 below would generate: > + > + mfvsrld 9,34 > + mtvsrdd 34,9,9 > + blr > + > + and now it generates: > + > + xxpermdi 34,34,34,3 > + blr */ > + > +#include > + > +vector long long > +splat_dup_ll_0 (vector long long v) > +{ > + /* xxpermdi 34,34,34,3 */ > + return __builtin_vec_splats (vec_extract (v, 0)); > +} > + > +vector double > +splat_dup_dbl_0 (vector double v) > +{ > + /* xxpermdi 34,34,34,3 */ > + return __builtin_vec_splats (vec_extract (v, 0)); > +} > + > +vector long long > +splat_dup_ll_1 (vector long long v) > +{ > + /* xxpermdi 34,34,34,0 */ > + return __builtin_vec_splats (vec_extract (v, 1)); > +} > + > +vector double > +splat_dup_dbl_1 (vector double v) > +{ > + /* xxpermdi 34,34,34,0 */ > + return __builtin_vec_splats (vec_extract (v, 1)); > +} > + > +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */ Nit: It's good to add \m..\M like the others, i.e. /* { dg-final { scan-assembler-times {\mxxpermdi\M} 4 } } */ ..., same for the below ones. > +/* { dg-final { scan-assembler-not "mfvsrd" } } */ > +/* { dg-final { scan-assembler-not "mfvsrld" } } */ > +/* { dg-final { scan-assembler-not "mtvsrdd" } } */ This patch is okay for trunk with these nits tweaked, thanks! BR, Kewen