From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 9368B3858C39 for ; Tue, 28 Nov 2023 04:26:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9368B3858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9368B3858C39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701145622; cv=none; b=KYmLmt58+lUIEWSgNg64UFfCr2F47dk8yCI+9XUSlZ2dw2SigHZYjEYo1HJ7zu0BNznXnAsiq932QJyeB/saqMOvKSOPFJ7C3VHPJA4/6P/wZhNOH9l+tR79GMakgwju4kcj07riWvTkHWl+2MkmRYnegKIsZ67dtd+yhNtPKL4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701145622; c=relaxed/simple; bh=kkwcCAAmKSFgkevxvPht4XPP6+bfHMxvHeSerJ4/niw=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=g3cbPj8v7kgdu9v3pZTjnATwb7/VUm1xTfcVuoeHvARbdvz+z8e0WvlQYL67SWLja0KXMEm4DY3+8H8U9iZJO1/0A7f+XkOWVh+6mx7yPfzIcqFKj8X38KDbX7zayv5WWAcr9gLYXtl99xY/7EadYZJOhUgZb8+KkHy1OyEfWBc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AS3fK4c013664; Tue, 28 Nov 2023 04:26:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : mime-version; s=pp1; bh=C/KcizW2uf2ut5Er3KtfsNyzVH/z7uOYRMbK0bsXBgQ=; b=XCSET60I9rWIcJ811PlA0U5gORgvXCvzFDo1L3bbQjQNT0eFhlfZwauRZryIdoP+d3lF /Ugds2xzT0T2Qvmas/VwhUl/GXE/cov1GypB4r9b/Ak26qMSnEW3z2gNHf4oGGxHpP/r GAHSHOhC+0EEqW8WGFYvh6iiGhzySIRqC9UVXebc9gVSURtvUEBfBJy6710wxdBz2KX4 /lGnOMBkLzAut+LJiGgl5L06lW3BKiEBA+Xql9i364Q3H80YlcMacvguTfK4EU+kBaGY 00sgZsQLXOgt/PqgUnCGSnsc2FBKDRoWb39Yk5AEJ3h17HpdD5uJQg2VkMS3acndk2+8 AQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3un7gpa6f4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Nov 2023 04:26:58 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AS4KjFs024626; Tue, 28 Nov 2023 04:26:57 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3un7gpa6eq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Nov 2023 04:26:57 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AS44VZs027624; Tue, 28 Nov 2023 04:26:56 GMT Received: from smtprelay02.wdc07v.mail.ibm.com ([172.16.1.69]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3ukumydgx5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Nov 2023 04:26:56 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AS4Qt4O14025384 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 28 Nov 2023 04:26:55 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 338E358052; Tue, 28 Nov 2023 04:26:55 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B3A655805A; Tue, 28 Nov 2023 04:26:54 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.83.136]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTPS; Tue, 28 Nov 2023 04:26:54 +0000 (GMT) Date: Mon, 27 Nov 2023 23:26:53 -0500 From: Michael Meissner To: "Kewen.Lin" Cc: Michael Meissner , Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner Subject: Re: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32))) Message-ID: Mail-Followup-To: Michael Meissner , "Kewen.Lin" , Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Peter Bergner References: <03ee6809-d7b3-34a8-7149-28c9b92c71f1@linux.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <03ee6809-d7b3-34a8-7149-28c9b92c71f1@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: pSPZmIYJTiplpLwHOjO6fAtd_8MgYMrl X-Proofpoint-ORIG-GUID: SZWDBRG3FXDhCTzPuMzgwk9Hdt97ihUF X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-28_02,2023-11-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 mlxscore=0 clxscore=1015 adultscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311280034 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Nov 24, 2023 at 05:41:02PM +0800, Kewen.Lin wrote: > on 2023/11/20 16:56, Michael Meissner wrote: > > On Mon, Nov 20, 2023 at 08:24:35AM +0100, Richard Biener wrote: > >> I wouldn't expose the "fake" larger modes to the vectorizer but rather > >> adjust m_suggested_unroll_factor (which you already do to some extent). > > > > Thanks. I figure I first need to fix the shuffle byes issue first and get a > > clean test run (with the flag enabled by default), before delving into the > > vectorization issues. > > > > But testing has shown that at least in the loop I was looking at, that using > > vector pair instructions (either through the built-ins I had previously posted > > or with these patches), that even if I turn off unrolling completely for the > > vector pair case, it still is faster than unrolling the loop 4 times for using > > vector types (or auto vectorization). Note, of course the margin is much > > smaller in this case. > > > > vector double: (a * b) + c, unroll 4 loop time: 0.55483 > > vector double: (a * b) + c, unroll default loop time: 0.55638 > > vector double: (a * b) + c, unroll 0 loop time: 0.55686 > > vector double: (a * b) + c, unroll 2 loop time: 0.55772 > > > > vector32, w/vector pair: (a * b) + c, unroll 4 loop time: 0.48257 > > vector32, w/vector pair: (a * b) + c, unroll 2 loop time: 0.50782 > > vector32, w/vector pair: (a * b) + c, unroll default loop time: 0.50864 > > vector32, w/vector pair: (a * b) + c, unroll 0 loop time: 0.52224 > > > > Of course being micro-benchmarks, it doesn't mean that this translates to the > > behavior on actual code. > > > > > > I noticed that Ajit posted a patch for adding one new pass to replace contiguous > addresses vector load lxv with lxvp: > > https://inbox.sourceware.org/gcc-patches/ef0c54a5-c35c-3519-f062-9ac78ee66b81@linux.ibm.com/ > > How about making this kind of rs6000 specific pass to pair both vector load and > store? Users can make more unrolling with parameters and those memory accesses > from unrolling should be neat, I'd expect the pass can easily detect and pair the > candidates. Yes, I tend to think a combination of things will be needed. In my tests with a saxpy type loop, I could not get the current built-ins to load/store vector pairs to be fast enough. Peter's code that he posted help, but ultimately it was still slower than adding vector_size(32). I will try out the patch and compare it to my patches. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com