From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E8451385842C for ; Tue, 9 Apr 2024 09:51:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E8451385842C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E8451385842C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712656271; cv=none; b=dkGc5kFK2CY9a2inWwLAbpKHm5E2HA4nd1qd4sK2Ti4j41AAT7mQsymbddORVeMYHWl8ave9EVxRLVhul365X4SgFz2Iag+Ozfz5C3M8F1kySniVuRy17CvNopScYcclZVOG30KxNzNMIjm+G4Z7U08QCFTCSVp7JAiqc5BvdNI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712656271; c=relaxed/simple; bh=QPSQu3hTQW7Fay6ONHvh/XO54+YZzeNPu+NjGpPnm7I=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=NqkgycKw998UNirKFBJuBwibiSLsnBgrLY0jHCmnjjyTKM3DsKDPlnU6IDzNtMDmIV7hvaRS4XNHS+XrHoHa+7hepTZeZfebPRxfv8bseOc4oshkNdhUJHO6R4jx8AVYyYwF1izoER6caPMfYu6jUZPvQfjPqJ3BbjJi0wQ07wo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4399foSc007277 for ; Tue, 9 Apr 2024 09:51:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : content-transfer-encoding : in-reply-to; s=pp1; bh=zvzs37twP8xITAAsfnFtagmIZVefdUODlx+RNt3wXYs=; b=smW0ouXbYXKU1cR8egXaqise3aZbJMFfxhiiY7D3zrTsuf2bsamTeh6NzKW8JK3n15ik po4f+0yKr+65Q0yQRMccZA2KmpHrc4t8QKbcfzmzAmI/x1Rkx1aey+Ovee8Am7M92pBp sfWLJzAyYPvoGortWWkLK6i7ZiXXI2of0u/IaHLZ2oljhxo1XW7pTe45WUULDKu1ij4x TbPs/hJIfIMgmRAyphjjPGkTym4arMrlT1gwWINLHZ57Em+JTEGfxH4rdJy9mFtf/Rbj a/9o4iuQ/iIzVc1UAGsYAOA8VUvmr/+Cz1B27ZRlAemWTlpwYxiWUWuLT3QJ1MmiNW3B 1g== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xd2pxg2n1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Apr 2024 09:51:07 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4399AZ3M029951 for ; Tue, 9 Apr 2024 09:51:07 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xbj7m5b29-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Apr 2024 09:51:07 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4399p1qq49480130 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Apr 2024 09:51:03 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4C2632004F; Tue, 9 Apr 2024 09:51:01 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0AD492004B; Tue, 9 Apr 2024 09:51:01 +0000 (GMT) Received: from li-819a89cc-2401-11b2-a85c-cca1ce6aa768.ibm.com (unknown [9.171.41.4]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTPS; Tue, 9 Apr 2024 09:51:00 +0000 (GMT) Date: Tue, 9 Apr 2024 11:51:00 +0200 From: Stefan Schulze Frielinghaus To: Juergen Christ Cc: gcc-patches@gcc.gnu.org, krebbel@linux.ibm.com Subject: Re: [PATCH] s390x: Optimize vector permute with constant indexes Message-ID: References: <20240402075601.7733-1-jchrist@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240402075601.7733-1-jchrist@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: eJLe2DFmYxgzznl713EXwyzEMmw9hlrh X-Proofpoint-ORIG-GUID: eJLe2DFmYxgzznl713EXwyzEMmw9hlrh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-09_06,2024-04-05_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 spamscore=0 clxscore=1015 bulkscore=0 adultscore=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 suspectscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404090062 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Apr 02, 2024 at 09:56:01AM +0200, Juergen Christ wrote: > Loop vectorizer can generate vector permutes with constant indexes > where all indexes are equal. Optimize this case to use vector > replicate instead of vector permute. > > gcc/ChangeLog: > > * config/s390/s390.cc (expand_perm_as_replicate): Implement. > (vectorize_vec_perm_const_1): Call new function. > * config/s390/vx-builtins.md (vec_splat): Change to... > (@vec_splat): ...this. > > gcc/testsuite/ChangeLog: > > * gcc.target/s390/vector/vec-expand-replicate.c: New test. > > Bootstrapped and regtested on s390x. Ok for trunk? > > Signed-off-by: Juergen Christ > --- > gcc/config/s390/s390.cc | 32 +++++++++++++++++++ > gcc/config/s390/vx-builtins.md | 2 +- > .../s390/vector/vec-expand-replicate.c | 30 +++++++++++++++++ > 3 files changed, 63 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c > > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc > index 372a23244032..4b4014ebe444 100644 > --- a/gcc/config/s390/s390.cc > +++ b/gcc/config/s390/s390.cc > @@ -17923,6 +17923,35 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d) > return false; > } > > +static bool expand_perm_as_replicate (const struct expand_vec_perm_d &d) ^~~~~~~~~~~~~~~~~~~~~~~~ Function names start on a new line. > +{ > + unsigned char i; > + unsigned char elem; > + rtx base = d.op0; > + rtx insn; > + /* Needed to silence maybe-uninitialized warning. */ > + gcc_assert(d.nelt > 0); ~~~~~~~~~~^~~~~~~~~~~~ Between function name and open bracket whitespace is missing. Curiously enough, the error is about d which is a reference and cannot be null. If you are eager you could reduce this and open a PR. s390.cc:17935:8: warning: ā€˜dā€™ may be used uninitialized [-Wmaybe-uninitialized] 17935 | elem = d.perm[0]; | ~~~~~^~~~~~~~~~~ > + elem = d.perm[0]; > + for (i = 1; i < d.nelt; ++i) > + if (d.perm[i] != elem) > + return false; > + if (!d.testing_p) > + { > + if (elem >= d.nelt) > + { > + base = d.op1; > + elem -= d.nelt; > + } > + insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem)); > + if (insn == NULL_RTX) > + return false; > + emit_insn (insn); > + return true; > + } > + else > + return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing; > +} > + > /* Try to find the best sequence for the vector permute operation > described by D. Return true if the operation could be > expanded. */ > @@ -17941,6 +17970,9 @@ vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d) > if (expand_perm_as_a_vlbr_vstbr_candidate (d)) > return true; > > + if (expand_perm_as_replicate(d)) ~~~~~~~~~~~~~~~~~~~~~~~~^~~ Between function name and open bracket whitespace is missing. > + return true; > + > return false; > } > > diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md > index 432d81a719fc..93c0d408a43e 100644 > --- a/gcc/config/s390/vx-builtins.md > +++ b/gcc/config/s390/vx-builtins.md > @@ -424,7 +424,7 @@ > > > ; Replicate from vector element > -(define_expand "vec_splat" > +(define_expand "@vec_splat" > [(set (match_operand:V_HW 0 "register_operand" "") > (vec_duplicate:V_HW (vec_select: > (match_operand:V_HW 1 "register_operand" "") > diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c > new file mode 100644 > index 000000000000..27563a00f22b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c > @@ -0,0 +1,30 @@ > +/* Check that the vectorize_vec_perm_const expander correctly deals with > + replication. Extracted from spec "nab". */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */ > + > + > +#define REAL_T double > +typedef REAL_T MATRIX_T[ 4 ][ 4 ]; > + > +int concat_mat_i, concat_mat_j; > +static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3); > +MATRIX_T *rot4p() { > + MATRIX_T mat3, mat4; > + static MATRIX_T mat5; > + concat_mat(mat4, mat3, mat5); > +} > +void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) { > + int k; > + for (;; concat_mat_i++) { > + concat_mat_j = 0; > + for (; 4; concat_mat_j++) { > + k = 0; > + for (; k < 4; k++) > + m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k]; > + } Just nitpicking, if we could come up with a test case which does not involve integer overflows due to non-terminating loops, I would prefer that. Cheers, Stefan > + } > +} > + > +/* { dg-final { scan-assembler-not "vperm" } } */ > -- > 2.39.3 >