From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 09E4838582B2 for ; Fri, 24 Nov 2023 09:41:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09E4838582B2 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 09E4838582B2 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700818874; cv=none; b=sYfLcDdC3JcUmswWqJqvFq0SgG7MEytWNU05W9oQxDuHjpE8gDekOQHfVD1r9CjgbIBreL2loZIlP2W/wkeHXMxlDvsZ1VfpaFqNUu5wFmdIi12cRWpOgkfdJosWq/+LAS9iit8px0TKIHZb6EhIkxzver8P9EiPF4BFofbIgEw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700818874; c=relaxed/simple; bh=na4Ws85pJwSQ4V0DQwFtzZb4W7uNvcg20euS6LiKpzI=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=CkD2ZkEKUebdmOSCcU7xdxDiYcRt54TKVYnRirRBerScvloWmyP1cFSpH2X+SHzw6t+/+u+WroYXwH44+jsPPv3gXBbN0yDS52PEyxWTi2JNBk8Z2DOLlafA9PRNa1ci1oC8bRYJNGNYCBYYcMF172hJTlncHKkDTLx1Mrg9d8g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AO9f41i010607; Fri, 24 Nov 2023 09:41:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : references : cc : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=DEuSGQVR3RI5CcLNBR+JmbHozryd+b8lBE6AmM4lMYA=; b=E2TsiSaMQwgPgpuf1jpYnHjHNZS7HpBrIkwFT8VA4PMk5TDllWPgTYogsHnwX/8ENkwT /OAGv+C2lQ+JK2IoMAFzX8MkJM3Pat+irfNnV8AgDC9v3Y4WYJjGDuaP+Mg18GjvtmJT QYG80YnRYsiRG0yICC9Ooro6dF7mj1huu8Ia3ncvm+SzTGgisiJXsT+lRNv54NBpusaA FBjHXTGQU9JoxOd4dIB8q3ds3bpQP3q/xzHjjSj/FnAr/Qss4uaDpl8vTMHgfg1uN/yg IdCcPkPJWKsdp22v/WqZuQbvDIpWIOWnBUXRkdyr5up79BsN1j37db9GHSN/aASz5L6x 8Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ujr1uj07d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Nov 2023 09:41:11 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AO9f6j3010956; Fri, 24 Nov 2023 09:41:11 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ujr1uj06q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Nov 2023 09:41:11 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AO9eqoB013519; Fri, 24 Nov 2023 09:41:10 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3uf8004y12-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Nov 2023 09:41:10 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AO9f6im43451026 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Nov 2023 09:41:06 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CDAC520043; Fri, 24 Nov 2023 09:41:06 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6E8D620040; Fri, 24 Nov 2023 09:41:04 +0000 (GMT) Received: from [9.177.73.175] (unknown [9.177.73.175]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 24 Nov 2023 09:41:04 +0000 (GMT) Message-ID: <03ee6809-d7b3-34a8-7149-28c9b92c71f1@linux.ibm.com> Date: Fri, 24 Nov 2023 17:41:02 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32))) Content-Language: en-US To: Michael Meissner References: Cc: Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Hh8T6Sg8N4eXxK9XFN5d52meIe9GwVjN X-Proofpoint-GUID: PC6gxppQPAaO512Z9-7cujpZBVJRBmP_ Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-23_15,2023-11-22_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 mlxlogscore=999 mlxscore=0 spamscore=0 suspectscore=0 phishscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 clxscore=1015 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311240076 X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: on 2023/11/20 16:56, Michael Meissner wrote: > On Mon, Nov 20, 2023 at 08:24:35AM +0100, Richard Biener wrote: >> I wouldn't expose the "fake" larger modes to the vectorizer but rather >> adjust m_suggested_unroll_factor (which you already do to some extent). > > Thanks. I figure I first need to fix the shuffle byes issue first and get a > clean test run (with the flag enabled by default), before delving into the > vectorization issues. > > But testing has shown that at least in the loop I was looking at, that using > vector pair instructions (either through the built-ins I had previously posted > or with these patches), that even if I turn off unrolling completely for the > vector pair case, it still is faster than unrolling the loop 4 times for using > vector types (or auto vectorization). Note, of course the margin is much > smaller in this case. > > vector double: (a * b) + c, unroll 4 loop time: 0.55483 > vector double: (a * b) + c, unroll default loop time: 0.55638 > vector double: (a * b) + c, unroll 0 loop time: 0.55686 > vector double: (a * b) + c, unroll 2 loop time: 0.55772 > > vector32, w/vector pair: (a * b) + c, unroll 4 loop time: 0.48257 > vector32, w/vector pair: (a * b) + c, unroll 2 loop time: 0.50782 > vector32, w/vector pair: (a * b) + c, unroll default loop time: 0.50864 > vector32, w/vector pair: (a * b) + c, unroll 0 loop time: 0.52224 > > Of course being micro-benchmarks, it doesn't mean that this translates to the > behavior on actual code. > > I noticed that Ajit posted a patch for adding one new pass to replace contiguous addresses vector load lxv with lxvp: https://inbox.sourceware.org/gcc-patches/ef0c54a5-c35c-3519-f062-9ac78ee66b81@linux.ibm.com/ How about making this kind of rs6000 specific pass to pair both vector load and store? Users can make more unrolling with parameters and those memory accesses from unrolling should be neat, I'd expect the pass can easily detect and pair the candidates. BR, Kewen